this code was using DGEMM to do a dot product, after transposing a 4D tensor, when the DGEMM was just transposing it back. also, an MA stack allocation was used for a scalar. automatic code generation is amazing, isn't it? :-)
this change does the dot product directly, with loops, without any transposes and with no unnecessary MA allocation.
this code was using DGEMM to do a dot product, after transposing a 4D tensor, when the DGEMM was just transposing it back. also, an MA stack allocation was used for a scalar. automatic code generation is amazing, isn't it? :-)
this change does the dot product directly, with loops, without any transposes and with no unnecessary MA allocation.