kraiskil / onnx2c

Open Neural Network Exchange to C compiler.
Other
184 stars 30 forks source link

ERROR("Non-constant 'constant_value' input to Pad would result in dynamic memory allocation") #32

Open henkten opened 1 year ago

henkten commented 1 year ago

Hello, first of all thank you for this great repository.

I have the following error when I try to export my onnx model to c using onnx2c.

onnx2c/src/nodes/pad.cc, line 45

Unfortunately I am a beginner in C and have no idea how to fix this error.

Here is an overview of the onnx model (exported from pytorch).

TCN_Model-sim onnx

Does anyone know a way to fix this error or can help me. I will be happy to answer any questions about the model or my setup.

Thank you very much in advance! With kind regards Henk

kraiskil commented 1 year ago

Hi Henk,

I looked at the onnx2c code around the pad operator, and found this comment, which suggest this is a bug in onnx2c.

The constant padding value should be usable even if not constant, but I guess the printed code does not yet know how to handle such a case.

A quick work-around, if applicable in your model, is to not give the constant value input to the Pad node when generating the network. Onnx2c would then use the default of 0 - but this of course might not be what you need?

henkten commented 1 year ago

Hi, thank you very much for your quick reply.

I have checked the constant_value = inputs[2] for my pad nodes. The image shows that the constant_value is an empty string. I suspect that is the error. image

I deleted the constant _value from all pad nodes and tested the model a second time using onnx2c. Now the code runs through this section without error.

Unfortunately, I have a new error for the cast nodes. For this error, I have no idea what to do. Here is a picture of the cast node. Maybe you have an idea how to fix this problem? image

Thank you very much in advance! With kind regards Henk

kraiskil commented 1 year ago

The constant_value can be a string as per the ONNX specification, though before ONNX opset version 13, it had to be an integer. This looks like a case where onnx2c hasn't kept up with the development of the ONNX specification.

For the cast, I think this is a duplicate of issue #26? I'm looking at the above graph, and but I can't spot a Cast node there?

henkten commented 1 year ago

I unfortunately do not understand from the issue #26 what I can change, do you have a suggestion how I can work around the error?

Sorry, i split the diagram of the simplified net above. Here is the diagram of the original mesh. Darts_TCN onnx

kraiskil commented 1 year ago

The problem with onnx2c Cast implementation is the switch starting on line https://github.com/kraiskil/onnx2c/blob/bdfe0f76b7a5674d9ea88c02684360fd615b2d1f/src/nodes/cast.cc#L28 Casts to integers never got implemented because Cast really doesn't sound like a useful operand on when targeting embedded systems - such a node should be optimized away. Maybe a tool like onnxsim could help fold away such nodes? Also, looking at the graph, the Dropout sounds like a pure training node - so one that is not implemented in onnx2c.

henkten commented 1 year ago

With the created model from onnxsim i got this error. Here a picture from the pad node. The model does not have cast and dropout nodes anymore. image

kraiskil commented 1 year ago

This is curious. Either your model is doing something really strange with the constant_value input tensor, or onnx2c handles this input incorrectly.

From the image I see the constant_value has no name, and mode: constant means constant_value should be a scalar, so I would lean towards an unimplemented feature in onnx2c. Would it be possible to post the simplified model to debug this?

kraiskil commented 1 year ago

A side note, I just had a look, and ONNX is having very few (1) backend test for the Pad operand. So onnx2c being non-conformant is pretty likely.

henkten commented 1 year ago

Sure. here a pic from the simplified onnx-model. tcn_model_mod_sim onnx (1)

For a better understanding of the creation of the onnx model. The onnx model is an export of a Temporal Convolutional Network from Pytorch. Maybe that is where the error of the missing name for constant_value comes from.

I can upload or send you the model if you want a better overview and understanding of the modelparameter.

kraiskil commented 1 year ago

Yes, please upload. Having the .onnx file allows to run onnx2c under the debugger, and this helps to understand the problem.

henkten commented 1 year ago

I have uploaded the original Onnx, a modified Onnx with changed input and output names, and a simplified version of the modified model.

Additionally I uploaded the script I used to modify the original model.

Here is the link to the repo.

kraiskil commented 1 year ago

Thanks for the files. I ran the tcn_model_mod_sim.onnx one to have a look at what is going on.

The problem is with the 2nd input to the Pad node (at least the /res_blocks.1/Pad_1 one). This pads input is of the format float32[64], which is not according to ONNX specifications (https://github.com/onnx/onnx/blob/main/docs/Operators.md#Pad). pads are supposed to tell how much to pad with. A tensor of floats sounds more like the contents of the added cells in the output padded tensor.

pads (non-differentiable) : tensor(int64) Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * num_axes] where num_axes refers to the number of elements in the axes input or the input rank if axes are not provided explicitly. pads format should be: [x1_begin, x2_begin, ..., x1_end, x2_end,...], where xi_begin is the number of pad values added at the beginning of axis axes[i] and xi_end, the number of pad values added at the end of axis axes[i].

Seeing such output sounds like a bug in onnxsim. I tried to run onnxsim on the input tcn_model.onnx, but that would not work, since onnxsim couldn't handle the Reshape node being of too new version (i.e. it has an 'allowzero' attribute as of ONNX-14)

Latest onnx2c version now contains at least a more explicit error message on this kind of input :)

Maybe another approach here is to look back, if padding really is necessary? In such a small network, would it be feasible to ignore the edges of the input? I would guess it would result in faster inference times.

henkten commented 1 year ago

Thank you for your time and investigation for my model.

This is not the original model which I use final, but a simple replica to work with it quickly and easily. Unfortunately, padding is required because the output length must be equal to the input length.

Did I understand it correctly then that it is not possible to manually adjust the padding operations to cancel the errors before I try to convert the model with onn2c?

kraiskil commented 1 year ago

Did I understand it correctly then that it is not possible to manually adjust the padding operations to cancel the errors before I try to convert the model with onn2c?

If by this you mean manually editing the .onnx file? I don't know of such a tool, but I would imagine there would be something out there to make this possible.

The problem here is the original (pytorch?) export to .onnx is creating a long chain of operations just to end up as the pads input to the Pad operand. (i.e. the ConstantOfShape, Concat, Reshape, Slice..., Cast). All this runtime calculation just to achieve what needs to be a compile time constant (i.e. onnx2c should recognize this pads tensor as a constant, otherwise the model has inference time dynamic memory allocations).

onnxsim should be able to do this simplification. But what I understand is the output of onnxsim in your repository is invalid, so seems you found an onnxsim bug. For some strange reason, I can't even re-run onnxsim to verify this.

It does sound strange that if you in pytorch provide a constant padding tensor to the Pad operand, that it would create such a long chain of operands to achieve this. Maybe there is a way of simplifying the .onnx at export time?

henkten commented 1 year ago

Hi, thanks for the reply, yes the onnx model is a pytorch export.

I will have a look at the export. Maybe I can find a solution for the pad node there, or a simplification of the export.