Closed morungos closed 2 years ago
Thanks for reporting. I can imagine this took some deep digging, much appreciated.
According to the spec (https://github.com/onnx/onnx/blob/master/docs/Operators.md#Gemm) , C
which here is of dimensions N
must be broadcastable to the multiplication results of size MxN
. So yes, it is wrong to loop C
over the M
dimension, c
is the correct index here.
The code in gemm.h also looks instantly incorrect: the loop index for C
is selected only based on the shape of C
. But this definitely needs a couple of more unit tests before attempting a fix... ONNX has 11 gemm backend tests that all pass.
Excellent, that was my assessment, eventually. But honestly, even without that in place, it looked fine like 95% of the time. That's the robustness of neural networks, I guess.
The good news is that I just patched the output, and when I did, I got a decently sophisticated LSTM model to be precisely consistent with all the other implementations I tried, and since this is the only tool that'll generate an implementation small enough to fit on my devices, I'm extremely happy.
The above commit should fix this issue.
And yes, this is not the first bug of this kind in onnx2c where the error in a big net has been small enough to make me want to attribute the difference to rounding errors that propagate through the net differently in different runtimes. So far all such have turned out to be bugs in onnx2c - except the Alexnet example from Onnx Model Zoo. Now I wonder if it would not be using Gemm...
Looks like this is fixed. Please re-open if problems persist.
I've been working away running a decently large language model through this, and i've found that there is a subtle but significant deviation in the outputs when I benchmark against both the original model and the ONNX runtime, which agree exactly. I've traced it through to the generated Gemm node at the very end, where the bias is not being correctly applied. It is, of course, entirely possible that it's my fault, but the generated code is seems suspect to me. I can provide the original ONNX file, but would rather not make it public for now.
Anyway, the generated code for this node is as follows:
My interpretation is that, somehow, the wrong index is being used in the calculation of
C
, in that the reference tor
should actually bec
. When I manually patchr
toc
, then I get the exact same output I see from Pytorch and onnxruntime.The logic here is a little complex for me, so someone else might have a better insight, but I thought I'd write this much at least while it's fresh.