Open rfurko-tt opened 2 weeks ago
@eyonland I think you are owner of the unary/binary ops. Feel free to move to the right person.
HI @yugaoTT I've noticed that DIV is implemented as RECIP and MUL ops inside. Looks like the case where broadcast should work was accidentally missed and seems easy to support it as everything is ready.
@dmakoviichuk-tt @eyonland @rfurko-tt Broadcast is not supported for Div OP
Although division (Div) is a combination of Recip and MUL, broadcasting was not occurring in the program factory during validation because the Div operation was treated as a separate operation. As shown in the image above, the bcast only checks for ADD, SUB, and MUL operations.
To enable broadcasting, I introduced DIV_FAST (as shown in the image below), which successfully enabled broadcasting. However, this altered the operation's definition, and the division itself is no longer happening as intended.
Repeat op Approach:
Finally, I combined the ones_like and multiply operations to achieve the desired broadcast. This approach produced the correct output for the test mentioned above (see the changes below for the differences).
Please find the PR here : #12932
If the changes are acceptable I can generalize It for both the inputs
@umadevimcw, it looks like your solution would generalize for both inputs automatically and possibly solve this as a general case on eltwise ops. I think the return on this is high and we quickly get to better coverage of this functionality.
@umadevimcw changes are not acceptable. It should be solved in the normal way. Yo rub one simple DIV op we added a few more ops. @eyonland It is a really bad solution for a couple of reasons: 1) we are adding specific changes to the generic binary function 2) we are making division slower. If we want to make it slower we can always call reciprocal and than multiplication like what we are doing right now as a workaround. 3) Currently DIV implemented as a chain of ops RECIP + MUL in one kernel. But looks like bcast doesn't support this chaining. It supports only MUL. So he right solution is to make sure that bcast can support chaining or add changes to the bcast kernel and add bcast_div* functions where we do chaining.
So if you take a look here: https://github.com/tenstorrent/tt-metal/blob/8e0e2e1994a67fa4b179c18b54565685e5eba04a/ttnn/cpp/ttnn/operations/data_movement/bcast/bcast_types.cpp#L13 Imaging you added BCastMathOp::DIV you need to add a few new kernel functions similar to "mul_tiles_bcast" which will do the same chaining as in regular binary op.
@umadevimcw Thanks for providing detailed explanation. Could you please provide more information:
ttnn::divide
is executed as two operations? We are often work in bfloat16
which suffers from lack of precision and I would expect every operation takes a little bit of precision from the result. Thanks in advance!
@rfurko-tt and @dmakoviichuk-tt , if I understand this problem correctly, we cannot do broadcasting within the kernel operation itself because there is no clear way to identify the logical shape of the tiled tensor in the width and height dimensions. We either need to allow this workaround to unblock models or we wait for the tensor layout and shape class to be rewritten and at that time we come back to this and fix the kernels to properly handle the broadcasting. My position is that we allow this for now, knowing it is not an optimal solution, but then circle back and implement this properly in the kernel itself after the tensor layout is fixed. @umadevimcw , please open a separate issue to track the work of implementing broadcasting in the kernel itself. This needs to be done regardless of the outcome of this issue.
@eyonland You understood the problem incorrectly. If broadcasting works in multiply it must work in div operation. There are no excuses. In my opinion this hot fix doesn't make any sense. Because it makes it work even slower than obvious workaround with 2 manuall calls. We should never allow "fixes" like this. In order to unblock the issue we can simply call reciprocal and multiply manually. It is a bug/issue in the binary op implementation where in one case div is treated ad chain of RECIP and MUL but when it goes broadcast way there is no support of it in both TTNN and kernel side. I expected you as an owner to drive this issue because it will require proper fixes in TTNN binary op.
Hi folks, we are hitting this issue in tt-mlir compiler so I just wanted to check with you what is the status of this issue and do you maybe have some estimate how long it will take to fix it?
Describe the bug I can't use
ttnn.divide
same way asttnn.multiply
. Multiply works as expected, divide crashes.To Reproduce
Expected behavior Divide should have same broadcasting rules to multiply.
Please complete the following environment information: