Error non-constant input (limit) to Range node

blastbay commented 8 months ago

I just discovered this project and I am attempting to convert a vits speech synthesis model from onnx to C. I am running into the following error message:

[FATAL] (resolve) Unimplemented: non-constant input (limit) to Range node

Looking at the implementation of the Range node it seems fairly simple, but I don't know what the best/cleanest way to lift this limitation would be. Are there any similar nodes that I could use as a starting point for how to receive non-const floats and generating the appropriate output C code? Presumably I would want to generate a C function that takes this as an input parameter, rather than printing out the actual number in the code as a literal? Any tips would be highly appreciated.

kraiskil commented 8 months ago

This is, unfortunately, a fundamental limitation for onnx2c. The target for the tool is embedded microcontrollers, where dynamic memory allocation notoriously can make programming more difficult.

In this case, the Range node's input limit is the output of some other node, i.e. it is content is not known at compile time. This means Range would produce a tensor, the size of which is known only at runtime ( https://github.com/onnx/onnx/blob/main/docs/Operators.md#Range ) And this Range's output tensor would then need to be allocated dynamically.

I think there are a few alternatives to solving this.

Check if your model really needs run-time definition of tensor sizes. Can it be changed to e.g. putting a constant upper limit to the size of tensors it produces?
If the model already does not use dynamically sized tensors, some intermediate library maybe adds a dynamically sized tensor as an optimization? Can this creating/optimizing/saving step be found and coerced not to produce a run-time value for the limit input for Range? Maybe the limit actually is already constant, but just not marked as one? You can also pass constant values on the command line (./onnx2c -d foo:3) for variables in the model (but I think this is implemented only for the inference batch size...). But this would require the limit to be a variable, not the output of some other node.
I'm not against adding dynamic memory allocation support to onnx2c. It's just a lot of work to implement and test. Pull requests welcome :)
the catch-all... find a more suitable tool than onnx2c

It's difficult to say which is the best approach here. Maybe seeing the .onnx file, or the image of it generated by e.g. netron would help? (or at least a small zoomed in part around the Range node, if the model is huge or secret)

blastbay commented 8 months ago

Thank you very much for your detailed explanation. The model I am using, which is for text to speech, does require tensors with dynamic sizes from what I can tell, because of several factors.

The number of symbols/phonemes that the model receives as its input, basically the text that is to be converted into speech.
The duration prediction which happens inside the model, where each phoneme receives a duration in seconds/number of spectragram frames.
The output tensor, where we go from spectragrams to individual audio samples. This is dependent on the duration prediction as well as the duration of each spectragram generated earlier in the graph.

It might be possible to assign upper limits to these and allocate a fixed amount of memory up front, but I'm not sure if this is the approach I want to take just yet. I have done some more reading into onnx, and I think I will take a stab at trying to write a specialized converter that is intended for these types of text to speech models. There are many kinds of text to speech architectures but they all share some fundamental similarities. I don't know how long this will take but I'm going to make a serious attempt. If I find that the results of my efforts are not too far from the goals of onnx2c, I will certainly submit a pull request for your consideration. In any event, your work on onxx2c will certainly help me to get started much more quickly.

Thank you again for the quick and thorough response, and for making onnx2c available.

kraiskil commented 8 months ago

Right, I can see how generating audio would benefit from dynamically sized tensors.

Looking forward to seeing your solution. Especially if there is something that can be incorporated into onnx2c.

From how I can foresee what an onnx2c-based solution looks like, supporting dynamic size tensors would require passing the tensor size info with the kernel function arguments, i.e. most likely wrapping the raw data and size info in a struct (like a light weight C++ class/object, really), and passing these to the functions instead of a raw pointer. This would require quite a bit of refactoring of the onnx2c codegen parts (i.e. always query the tensor for it's sizes - in every node implementation...). Somehow I feel such a restructuring would actually make the onnx2c code base a bit cleaner though :) If you decide to give this approach a go, please keep in touch early to design on how the new onnx2c internals would look like.

Good luck :)

kraiskil / onnx2c

Error non-constant input (limit) to Range node #37