jatinchowdhury18 / RTNeural

Real-time neural network inferencing
BSD 3-Clause "New" or "Revised" License
543 stars 57 forks source link

[Feature request] C/C++ static array/vector assignment for weights with static model API #135

Closed jake-is-ESD-protected closed 2 months ago

jake-is-ESD-protected commented 2 months ago

RTNeural seems like a great choice for resource constrained devices, which really makes me wonder why loading a model requires a file system.

According to the docs, the compile time API can only load a model's weights via a json file, which requires a file system. While file systems can be implemented on embedded systems such as the Daisy Seed, a perfect target device for RTNeural, it seems like an unnecessary stretch to add one just for loading a simple model. Instead, a json to .h parser would create a .h file at compile time which would directly be linked to the firmware, eliminating any form of file system dependency. Or the .json file is bypassed directly and a saved model can be converted to a header file via a script.

GuitarML did this with a manual python-based parser, but it is very unclear to me how the developers got the information on the necessary syntax, as the example is tailored exactly for their application and nothing more. I believe that a modular solution directly from you would greatly benefit applications for embedded systems and could easily outperform tflite micro or X-CUBE-AI, among others.

It would already be helpful to have some docs on the required "even more static" initialization of models by directly assigning weights as arrays to layers of the model. If they exist and I missed them, I'd be happy to be pointed to them by you. I noticed that I can setWeights() of a single layer at runtime, but that is missing a way to add that layer to the model. Would be nice if something like this would be possible:

RTNeural::ModelT<double, 8, 1
    RTNeural::DenseT<double, 8, 8,>,
    RTNeural::TanhActivationT<double, 8>,
    RTNeural::DenseT<double, 8, 1>
> model;

void initModel(const double** w, const double** b){
    model.get(1).setWeights(w[0], b[0]);    // first dense layer
    model.get(3).setWeights(w[1], b[1]);    // second dense layer
    // ...
}

This way, the architecture is still set up statically, while the weights are loaded in as const arrays from somewhere else. The indexing management of the weights would drop back to the user. While this might lead to unintended misuse, a helper script as mentioned above might be a good addition to parse a .json or saved model file into RTNeural-standardized constant shapes.

jatinchowdhury18 commented 2 months ago

Hello, thanks for making the issue. These are good ideas, but they are also part of a larger discussion about how RTNeural can/should inter-operate with the libraries that people are using for training neural networks, so I'm going to provide a bit of background. I hope I'm not re-explaining things that you already know, but I just want to be sure that we're on the same page.

When I started working on RTNeural, I was starting with a "sequential" model trained in Tensorflow. That's where the Python script for "exporting" a TensorFlow model as a JSON file originated. While this script still works and is useful, it has several limitations, most notably, it only works with models trained in TensorFlow, and it only works with neural networks that have a "sequential" architecture.

One area of confusion is that there are other ways (aside from the one provided by RTNeural) to convert a trained neural network to a JSON file. For example, the GuitarML project often uses networks that were trained in PyTorch, and use a different approach for exporting the model weights as a JSON file (RTNeural uses a similar approach for testing inter-operability with PyTorch). However, the JSON files exported from PyTorch are not directly compatible with the ones used by RTNeural.

So when loading a trained network into RTNeural, the README demonstrates the "simplest" method, but also most limited method. As you noticed, all of the relevant "layer" implementations in RTNeural contain methods for setting the weights used by that layer. While these methods are "optional" for folks who are loading their models from an RTNeural-compatible JSON file, they are kind of "mandatory" for everyone else. So the most "general" way to load weights into an RTNeural model is to do something like this (which I believe is similar to your example code):

RTNeural::ModelT<float, 2, 1, RTNeural::LSTMLayerT<float, 2, hidden_size>, RTNeural::DenseT<float, hidden_size, 1>> model;

auto& lstm = model.get<0>();
lstm.setWVals(...);
lstm.setUVals(...);
lstm.setBVals(...);

auto& dense = model.get<1>();
dense.setWeights(...);
dense.setBias(...);

I think that's about everything I wanted to mention about the past and present states of RTNeural. Now thinking about the future: what should we do differently, and how can we do better overall?

I think the most obvious thing that needs to improve is the documentation. The weights-loading methods are often poorly-named, and are not always documented in such a way that the user can easily tell, for example, what should be the dimensions of the data being passed into this method?

I think the next obvious thing is making the weights-loading methods more flexible. Right now most of these methods accept either a std::vector<T> or std::vector<std::vector<T>>. Having to use a vector can be inefficient, not to mention that I think std::vector<std::vector<T>> is a very poor way to represent a 2D "matrix" (unfortunately I didn't know any better at the time). Maybe we can take some inspiration from newer C++ features like std::span and std::mdspan here.

Anyway, getting back to the most immediately relevant problem: how can RTNeural support users in getting their trained network weights from their training environment into RTNeural? There's two parts to this: "exporting" the weights from the training environment, and "importing" the weights into RTNeural.

For "exporting" weights, I'm having trouble thinking of any RTNeural-driven solution that can be robust in meeting all users' needs. Even with the relatively simple script that RTNeural currently supports for exporting TensorFlow sequential models, the script sometimes needs updating when TensorFlow changes some of its internals, and there's still a number of "edge cases" that the script doesn't handle. If we were to try making that a "general" solution, by adding support for non-sequential models, PyTorch models, and multiple "output formats" (e.g. JSON, C-style header file, etc), I worry that the effort of maintaining that solution would end up being larger than the effort required to maintain the rest of RTNeural, and would distract from the "actual" goals of this project. That said, if a solution like this were to exist (or already exists) outside of RTNeural, that could be very useful for us!

So with that in mind, I think the best approach would be to work with the "exporting formats" provided by the major training libraries. We've been doing a little bit of work in this direction already, with having some helper functions to load RTNeural layers using the JSON representation exported by PyTorch. That said, I don't think that TensorFlow or PyTorch automatically support exporting header files, and I agree that would be a special case that RTNeural should try to provide support for. Maybe there could be some RTNeural "extension" to generate a header file using the work that RTNeural is already doing to load weights for some layers? Otherwise, it would mostly be up to the user to figure out how to convert the weights exported from their training environment into a header file, if that's something that they want to do (as the GuitarML folks did in the example you linked to).

For "importing" network weights on the RTNeural side, I've been gradually less in favor of an "automatic" solution, which is sort of what we have now with the parseJson() methods. While these methods can be convenient for simple network architectures, I really don't think they're a good solution for more general cases. But the alternative is that people have to mostly figure out for themselves how to get their weights into their RTNeural model, which would be less convenient.

There's kind of a theme here, which I think touches a few other parts of RTNeural as well: to what extent do we want to provide "out-of-the-box" solutions, and to what extent do we want to provide a more "open" and flexible interface that users may have to navigate more on their own? I think the best solution is to try doing both, so that the library is easy to use for beginners, but also very capable for "power users"... unfortunately trying to do both takes effort :).

I think that's everything I wanted to say... sorry to ramble on for so long! Thinking more on it, I like the idea of an RTNeural layer being able to return a string containing it's own weights, which could then be used to construct a header file. I'll think more about this, and maybe try to make a small example of how this could work!

jake-is-ESD-protected commented 2 months ago

Thank you for the very comprehensive and expansive answer.

As a library/API/SDK developer myself, I understand your search for the impossible triangle of "easiest usability", "most flexibility" and "best performance".

Possible docs

I think it would be very beneficial if a few "manual" examples without a parser were presented, as this shows users 3 things at once:

  1. How weights can be manually set
  2. What types of array dimensions a certain layer takes
  3. How the user themselves (after understanding 1 and 2) can write a custom script to parse data in their own way if needed This is basically a documentation of the manual process that the JSON parser is doing under the hood anyways.

    On "exporting"

    I don't think that "exporting" models of any kind is RTNeural's job, as it is an inference engine, not a training environment. Therefore, as you mentioned, it would be best to support standard saved model formats. Here, my knowledge ends and I can't reliably tell how "hard" it is to parse a .pb, .h5, .tflite and so on. Similar to your supported layers, you would decide for a set of formats that can be used with RTNeural, maybe with a focus on formats that are usually associated with low level devices, as that is a unique niche that you can tailor to (e.g. tflite-files). For other formats or non-sequential architectures, you can't be expected to support anything immediately after Google/Meta update TF/PyTorch update their libraries, that's just a consequence of dependent software. If enough users request the support for a new, modern format, then additional development can be put into that area, but trying to achieve that out of the box is trying to see the end of a bottomless pit.

On "importing"

I too am more in favor of a more manual approach. I do understand that some users might want to just "use and forget" RTNeural, in a sense "just make my model go brrr", but that's a greater problem itself. If people use RTNeural, they are interested in speed, performance and real-time and that comes with a platform switch anyways: If that wasn't necessary, users would stick to python environments. Just think about preprocessing or feature extraction to even make use of inferencing. All of that has to be ported to different code/platforms anyways, so you can expect at least some "effort" from the user.

On out-of-box vs. power user

I don't know much about the inner workings of RTNeural, but optimal API design would create an "out-of-the-box" layer at the top and the experienced user could step one layer below and use certain "private" getter and setter functions to more accurately achieve what they are trying to do. Maybe RTNeural has such an architecture already. But all of this is easier said than done. If you had to decide between the two, I would aim for the power user (biased opinion) simply because RTNeural already is pretty specific by nature.

For the time being, could you elaborate a bit further on the provided example? Do the indices count the activations functions as well? Feel free to move all this to a place that deals with discussions, as this it not really an issue as such.

jatinchowdhury18 commented 2 months ago

Thanks for the reply!

Documentation

So after looking a little bit closer, it looks like I had already added documentation for many of the layer classes, regarding the dimensions of the weight matrices that should be passed to each of the relevant layer functions (here's an example). That said, I think I should still go through the library and make sure that every relevant function has the correct dimensions listed, as well as adding more useful descriptions for what each dimension should represent where appropriate.

I also started writing up a little example, which I think will answer some of your questions. Please let me know if there are additional things in the example that need clarification. At the moment, the example only lives in README form, but I can make it a runnable example as well with a little bit of additional effort.


Thanks for sharing your other insights as well! I'm starting to plan out some larger-scale updates for RTNeural, hopefully for later this year, so I'll keep those ideas in mind, especially re: finding the right balance for end users of all types.

I'd say that discussing the example is relevant to this issue, but if you would like to go into more off-topic conversations, feel free to join us at the Discord linked in the README!

jake-is-ESD-protected commented 2 months ago

For anybody else reading this, this is the

Solution

Follow the new official example in the README or use my quick and dirty solution:

RTNeural::ModelT<double, 4, 1
    RTNeural::DenseT<double, 4, 4>,
    RTNeural::TanhActivationT<double,4>,
    RTNeural::DenseT<double, 4, 1>
> modelT;

auto& denseIn = modelT.get<0>();
auto& denseOut = modelT.get<2>();

denseIn.setWeights({{1.0, 2.0, 3.0, 4.0}, {5.0, 6.0, 7.0, 8.0}, {9.0, 10.0, 11.0, 12.0}, {13.0, 14.0, 15.0, 16.0}});
double biasIn[] = {1.0, 2.0, 3.0, 4.0};
denseIn.setBias(biasIn);

denseOut.setWeights({0.1}, {0.2}, {0.3}, {0.4}});
double biasOut[] = {0.1};
denseOut.setBias(biasOut);

Thanks for the update @jatinchowdhury18 !