Model gives erroneous values as prediction

mikegazzaruso commented 1 year ago

Hi, I trained my model with tf/keras.

Model:

model = Sequential()
### First Layer
model.add(Dense(100, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))
### Second Layer
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dropout(0.5))
### Third Layer
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dropout(0.5))

### Final Layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

I exported json weights with your script, this is the result when I import it:

# dimensions: 40
Layer: dense
  Dims: 100
Layer: unknown
  Dims: 100
Layer: unknown
  Dims: 100
Layer: dense
  Dims: 200
Layer: unknown
  Dims: 200
Layer: unknown
  Dims: 200
Layer: dense
  Dims: 200
Layer: unknown
  Dims: 200
Layer: unknown
  Dims: 200
Layer: dense
  Dims: 10
Layer: unknown
  Dims: 10

First question is: are that "Unknown" networks ok? They should be Activation(s) and Dropout(s) layers.

However, later in C++ code:

auto model = RTNeural::json_parser::parseJson<double>(jsonStream, true);
 const double testInput[] = { -378.06, 17.45, -49.15, 37.95, 10.41, 35.25, 8.45, 12.91, 0.70,  23.27, 7.05, 21.32, -3.18, 7.67, 2.08, 17.77, -0.00, 8.85, 0.94,6.91, 4.10, 10.23, 2.81, 9.09, 4.54, 7.57, 6.54, 8.60, 5.48, 5.16, 3.71, 6.40, 5.05, 4.76, 0.70, 0.68, 5.02, 1.84, 3.80, 4.50 };
model->forward(testInput);
const double* testOutput = model->getOutputs();

In testOutput I have first 10 values of the array populated with quite large double values (i.e. 5007.7344438 and so on), while the output predictions should be vector categorical class labels (i.e. [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]).

Inputting the same coefficients in Python and making inference outputs correctly the right class label.

What am I missing? My feelings are for those "Unknown" layers.

Thanks for your help and thanks for the librari.

mikegazzaruso commented 1 year ago

Ok, I think I figured out.

Problem seems to be Dropout Layers and Activation Layers declared separately: RTNeural can't properly detect them type and mark it as "Unknown".

I rewrote and retrained my model using no Dropout Layers and specifying Activation iFunctions in Network's Layers directly.

Python/TF/Keras:

model = Sequential()
### First Layer
model.add(Dense(100, input_shape=(40,), activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.3))
### Second Layer
model.add(Dense(200, activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.5))
### Third Layer
model.add(Dense(200, activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.5))

### Final Layer
model.add(Dense(num_labels, activation='softmax'))
#model.add(Activation('softmax'))

C++ RTNeural detected Network:

# dimensions: 40
Layer: dense
  Dims: 100
  activation: relu
Layer: dense
  Dims: 200
  activation: relu
Layer: dense
  Dims: 200
  activation: relu
Layer: dense
  Dims: 10
  activation: softmax

No "Unknown" Layers this time.

And now, when calling getOutputs(), prediction I get is something like this:

output[0]: something34383488e-13
output[1]: something8983488e-12
output[2]: something8433e-13
output[3]: 0.9999999999998
output[4]: something3849843e-13
...... etc.

and this seems to be correct to me, because right class label vector for this inference should be: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] (tested in Python)

So I think the problem is not regarding Dropout Layers, but the fact you should pass dropout value (and activation) directly in one node's layer.

What do you think about?

Thank you

jatinchowdhury18 commented 1 year ago

Hi again!

So for the Dropout layer, I think the best thing to do is pass that to the layers_to_skip argument in the save_model function. The reason there being that Dropout is typically only active when a model is being trained, and since RTNeural is only performing inference, it doesn't really need to know about the Dropout layers. Maybe we should add keras.layers.Dropout as one of the default layers to skip?

For the Activation layers, I typically use those in my TensorFlow code similar to your second example, but it would be nice if RTNeural could support ways of writing the code in TensorFlow. I'll have to think about it a bit more, but I think there's a way to make it work.

mikegazzaruso commented 1 year ago

I just passed Dropout and Activation layers directly in my layer's declaration, no problem this way.

You did a great work with the library, Sir.

Kudos!

jatinchowdhury18 commented 1 year ago

By the way, with e2ec8ea, RTNeural can now load models using activations as in your initial example.

Anyway, closing for now since i think this is solved.

jatinchowdhury18 / RTNeural

Model gives erroneous values as prediction #85