PyTorch state_dict to RTNeural JSON

ross-ca commented 2 years ago

I'm looking for an effective way to export a PyTorch Lightning model to the JSON format that RTNeural accepts. I've seen the previous post about using PyTorch models exported with torchscript, but can't seem to get this to work.

Instead, the PyTorch state_dict dictionary appears to contain all the information required to build a JSON file in the format that RTNeural expects. I understand that this will involve writing some kind of parser, but I'm not sure how to go about this. Any help would be massively appreciated. An example of the dictionary returned after executing the state_dict() method on a model is attached.

Thanks!

state_dict_export.txt

jatinchowdhury18 commented 2 years ago

Hello!

For using RTNeural with a specific model like this one, it should be possible to implement the model "by hand":

Convert the state_dict object to a regular JSON file (see here).
Create your model with RTNeural in C++.
Write some C++ code to load the weights into the model directly from the JSON file (here's an example).

Now it would be nicer (though more difficult) to automate this process, so that we can "transform" the state_dict into an RTNeural model file like we're able to get from TensorFlow. The RTNeural model file needs to have the following things:

The input shape of the network
A list of layers in sequential order
Each layer must have:
- The layer type
- The layer activation type
- An output shape
- The layer's weights
- Convolutional layers should also provide a kernel size and dilation

With that in mind there's a few things about the state_dict that you shared which bring about a few questions:

The state_dict doesn't seem to show the "type" of each layer? Some layers are labelled "hidden" or "residual", but I'm not sure if those can be translated into things like "Dense" or "Conv1D". Maybe the layer type could be inferred from the shape of the weights?
I don't see any activation functions mentioned in the state_dict. Maybe this network does not use any activations?
I'm not sure I understand the order of the layers in the state_dict. For example, it seem like the input layer is second to last?

It's also worth mentioning that RTNeural currently only supports automated loading for sequential networks. More complex network architectures will require a "by hand" approach like the one mentioned above. If you have suggestions for improving the RTNeural model format, that would be welcome as well!

Hope this information is helpful!

ross-ca commented 2 years ago

Thank you so much for your reply Jatin, this is really helpful!

Would you be able to explain a little more about the structure of the RTNeural model JSON? How exactly should it be laid out?

Thanks again!

jatinchowdhury18 commented 2 years ago

No problem!

For the RTNeural format, I would definitely suggest checkout out this example model file, but the basic format goes:

{
    "in_shape": [
        null,
        null,
        1
    ],
    "layers": [
        {
            "type": "dense",       // layer type goes here
            "activation": "tanh",  // activation type goes here
            "shape": [
                null,
                null,
                8                  // layer output shape goes here
            ],
            "weights": [
                ... // layer weights (including biases) go here
            ]
        },
        ... // the rest of the layers continue on here
    ]
}

ross-ca commented 2 years ago

No problem!

For the RTNeural format, I would definitely suggest checkout out this example model file, but the basic format goes:

{
    "in_shape": [
        null,
        null,
        1
    ],
    "layers": [
        {
            "type": "dense",       // layer type goes here
            "activation": "tanh",  // activation type goes here
            "shape": [
                null,
                null,
                8                  // layer output shape goes here
            ],
            "weights": [
                ... // layer weights (including biases) go here
            ]
        },
        ... // the rest of the layers continue on here
    ]
}

That's perfect, thank you for your help.

In the example model file, I noticed that the final array of weights for each layer is separated from the rest. Is there a reason for this? Are these the bias values rather than the weights?

Thanks again!

jatinchowdhury18 commented 2 years ago

In the example model file, I noticed that the final array of weights for each layer is separated from the rest. Is there a reason for this? Are these the bias values rather than the weights?

The order of the weights for each layer is whatever the TensorFlow getWeights() method returns for a given layer, but yes, I think for most layers that ends up being the layer biases.

ross-ca commented 2 years ago

In the example model file, I noticed that the final array of weights for each layer is separated from the rest. Is there a reason for this? Are these the bias values rather than the weights?

The order of the weights for each layer is whatever the TensorFlow getWeights() method returns for a given layer, but yes, I think for most layers that ends up being the layer biases.

What should the JSON activation function be for layers with a gated activation function, as seen in WaveNet, for example?

jatinchowdhury18 commented 2 years ago

RTNeural doesn't currently support gated activations, but it would be great to add that support. Are you familiar with how some existing framework implements gated activations? (For example, most of the layers currently implemented are based on the Keras implementation)

ross-ca commented 2 years ago

RTNeural doesn't currently support gated activations, but it would be great to add that support. Are you familiar with how some existing framework implements gated activations? (For example, most of the layers currently implemented are based on the Keras implementation)

Hmm, I'm not familiar. Do you have an email where we could speak further? Thanks! :)

jatinchowdhury18 commented 2 years ago

Sure thing, feel free to email at jatin@ccrma.stanford.edu

jatinchowdhury18 / RTNeural

PyTorch state_dict to RTNeural JSON #53