Describe the bug
With the recent tilization support of FP32 enabled and BFP16 no longer needing an even number of elements in the last dimension. I am able to enable a lot more tests in my GGML backend. However, I find unitests reporting significant numeric errors for adding and multipling tensors of shape [1, 1280, 1, 1] and [1, 1280, 16, 16] (nr is scaling factor, GGML stores shapes in reverse order).
F32 in GGML is emulated with BFP16 right now to due to unrelated reasons.
Hey @marty1885 , thank you for the report. @yan-zaretskiy is actively working on fixing the broadcasting as the existing implementation has numerous issues.
Describe the bug With the recent tilization support of FP32 enabled and BFP16 no longer needing an even number of elements in the last dimension. I am able to enable a lot more tests in my GGML backend. However, I find unitests reporting significant numeric errors for adding and multipling tensors of shape
[1, 1280, 1, 1]
and[1, 1280, 16, 16]
(nr is scaling factor, GGML stores shapes in reverse order).F32 in GGML is emulated with BFP16 right now to due to unrelated reasons.
To Reproduce The following program is the minimal reproducible example in TTNN/C++ (also is what one of the testcases in the above GGML log).
Adding 2 tensors of all 1s together should product tensors of all 2s (and 0s because of the 0 padding). Instead it produces tensors of 2s and 1s.
Remarks:
[1, 1280, 16, 16]
with[1, 1280, 16, 1]
returns correct resultsExpected behavior Adding should work in all permitted broadcasting.
Screenshots If applicable, add screenshots to help explain your problem.
Please complete the following environment information:
Additional context Add any other context about the problem here.