kraiskil / onnx2c

Open Neural Network Exchange to C compiler.
204 stars 34 forks source link

Floating point exception during conversion to C #41

Closed kgeeting closed 6 months ago

kgeeting commented 7 months ago

Cool library. I tried converting one of my onnx models using the tool and was thrown a floating point exception error (zsh: floating point exception) during conversion. Apple clang version 15.0.0 (clang-1500. Maybe some unhandled div by 0 somewhere?


kraiskil commented 7 months ago

Thanks :)

I had a quick look at your model, and indeed, the exception is a division by zero. This stems from the first convolution layer /isi_encoder/conv1/Conv that is a 2D convolution, but the stride is given as [2]. onnx2c then iterprets this as a stride of [2,0], which of course is rather silly.

Now the onnx documentation states:

strides: int64[] Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

This reads to me like either "pad missing dimensions with ones" or "strides must be of correct dimensions or not given at all". Whereas somehow a "stride of 2", like in this model, does sound more like 2 in each dimensions. Was this your intention?

There might be in onnx docs mentioned some rule about splatting attributes, but I can't find it right now.

I would provisionally say this is

Quick fix would be to try modify the model to have the attributes more explicitly encoded :)

Btw, just for posterity - which tool (and version) generated this .onnx file?

kgeeting commented 6 months ago

Sorry for the late reply. I reviewed the onnx model being fed in and your provisional guess was correct- the model was malformed, with stride discrepancies (as noted above) and different dimension sizes during a few tensor concatenations. Oddly neither of these issues were flagged when first converting the python model from PyTorch (v2.2) to onnx. But subsequent attempt to convert to C with your tool flagged them. :)

I've seen promising results when subsequently profiling the c models on an STM Nucleo board, and again just want to say, well done on the tool! I may look at the quantization alpha features you've got next. I do agree that maybe reporting the stride error (inspatialfilter.h) might be helpful in case people run into similar problems in the future. Cheers

kraiskil commented 6 months ago

Thanks for the followup.

There is a lot of rules in the onnx documentation, most of which onnx2c does not check. Just because it is a lot of lines of code... But it definitely should add a few checks for this kind of thing where something as popular as pytorch creates bad input.

Re the quantization - it is really alpha level. It has only even been used to quantize that one example with the AVR (, and probably still has some parts of that project hard coded in the sources. I was thinking actually of removing that quantization feature completely...

I would strongly recommend trying out other quantizers out there first. When I wrote that quantization thing I found nothing that works or has a reasonable learning curve. But the field moves fast, and nowdays there seem to be options.

kraiskil commented 6 months ago

Added the strides check in the above commit.