Missing post-activation layer for GRU & LSTM when parsing JSON

christoph-hart commented 6 months ago

Hi there,

I'm currently toying around with the library and I noticed that the JSON parser for the tensor flow model does not add a activation layer after the GRU / LSTM models:

https://github.com/jatinchowdhury18/RTNeural/blob/0485da997cf78041b8dba9b698fff8a505715a0d/RTNeural/model_loader.h#L687

I've compared the model layout of the model from your RTNeuralExample repository with the JSON and noticed that inconsistency. The tensorflow JSON does list a activation function as you can see here:

"type": "gru",
"activation": "tanh",
"shape": [ null, null, 8 ],

I'm just starting out with the entire ML stuff so it might be a silly question, but is there a reason for the activation layers to be omitted from the JSON parser for the GRU and LSTM layers?

janaboy74 commented 6 months ago

I'm working on it. The parser is more or less fixed, but the output formatter is still wrong.

jatinchowdhury18 commented 6 months ago

Hello!

So the "root" of the problem here is a bit of a "compatibility" problem between TensorFlow and RTNeural.

In TensorFlow (and I think PyTorch as well), the GRU and LSTM layers have their own "internal" activation functions. In RTNeural, we use the "default" tanh activation functions for these layers, and the activation functions are "built in" to the layer implementations.

The JSON files are typically generated from TensorFlow's representation of a model. It works something like this:

for layer in model.layers:
    layer_dict["activation"] = layer.activation

This way, when you define a TensorFlow layer with an activation function, the activation function will be part of the JSON file. However, for TensorFlow's GRU and LSTM implementations, layer.activation will return "tanh". Since the RTNeural implementations of these layers already include the "built in" activation functions, and we don't want to apply the activation function twice, we ignore the "activation" JSON field for those layers. The full Python script can be found here.

If you're manually writing/generating your own JSON file, and would like to have an additional activation applied after your GRU or LSTM layer, you could add another layer to your JSON file, that looks something like:

{
    "type": "gru",
    "activation": "tanh",
    "shape": [ null, null, 8 ],
}

Hope that this is helpful! I'm also curious about the fixes @janaboy74 is making for the parser?

janaboy74 commented 6 months ago

I think I have fixed the jsonparser: "- Json parser now uses recursion and I hope it works now correctly."

jatinchowdhury18 / RTNeural

Missing post-activation layer for GRU & LSTM when parsing JSON #124