Possible issue in generated Gemm node code

morungos commented 2 years ago

I've been working away running a decently large language model through this, and i've found that there is a subtle but significant deviation in the outputs when I benchmark against both the original model and the ONNX runtime, which agree exactly. I've traced it through to the generated Gemm node at the very end, where the bias is not being correctly applied. It is, of course, entirely possible that it's my fault, but the generated code is seems suspect to me. I can provide the original ONNX file, but would rather not make it public for now.

Anyway, the generated code for this node is as follows:

static inline void node_Gemm_27( const float tensor_157[1][96], const float tensor_decoder_weight[58][96], const float tensor_decoder_bias[58], float tensor_output[1][58] )
{
    /* Gemm */
    /* alpha   = 1.0000000000000000000
       beta    = 1.0000000000000000000
       transA  = 0
       transB  = 1
       A       = (1x96)
       datatype= float
     */
    const int M = 1;
    const int K = 96;
    const int N = 58;
    float (*A)[96]  = (float(*)[96])tensor_157;
    float (*Y)[58]  = (float(*)[58])tensor_output;
    float alpha = 1.0000000000000000000;
    float beta = 1.0000000000000000000;
    for( uint32_t r=0; r<M; r++ )
        for( uint32_t c=0; c<N; c++ ) {
            float ABrc = 0;
            for( uint32_t i=0; i<K; i++ ) {
                float B = tensor_decoder_weight[c][i];
                ABrc += A[r][i] * B;
            }
            float tmp = ABrc * alpha;
            float C = tensor_decoder_bias[r]; // <-- THIS SEEMS WRONG
            tmp += C * beta;
            Y[r][c] = tmp;
    }
}

My interpretation is that, somehow, the wrong index is being used in the calculation of C, in that the reference to r should actually be c. When I manually patch r to c, then I get the exact same output I see from Pytorch and onnxruntime.

The logic here is a little complex for me, so someone else might have a better insight, but I thought I'd write this much at least while it's fresh.

kraiskil commented 2 years ago

Thanks for reporting. I can imagine this took some deep digging, much appreciated.

According to the spec (https://github.com/onnx/onnx/blob/master/docs/Operators.md#Gemm) , C which here is of dimensions N must be broadcastable to the multiplication results of size MxN. So yes, it is wrong to loop C over the Mdimension, c is the correct index here.

The code in gemm.h also looks instantly incorrect: the loop index for C is selected only based on the shape of C. But this definitely needs a couple of more unit tests before attempting a fix... ONNX has 11 gemm backend tests that all pass.

morungos commented 2 years ago

Excellent, that was my assessment, eventually. But honestly, even without that in place, it looked fine like 95% of the time. That's the robustness of neural networks, I guess.

The good news is that I just patched the output, and when I did, I got a decently sophisticated LSTM model to be precisely consistent with all the other implementations I tried, and since this is the only tool that'll generate an implementation small enough to fit on my devices, I'm extremely happy.

kraiskil commented 2 years ago

The above commit should fix this issue.

And yes, this is not the first bug of this kind in onnx2c where the error in a big net has been small enough to make me want to attribute the difference to rounding errors that propagate through the net differently in different runtimes. So far all such have turned out to be bugs in onnx2c - except the Alexnet example from Onnx Model Zoo. Now I wonder if it would not be using Gemm...

kraiskil commented 2 years ago

Looks like this is fixed. Please re-open if problems persist.

kraiskil / onnx2c

Possible issue in generated Gemm node code #12