Closed dongjin-na closed 11 months ago
fyi @davorchap
@davorchap, can you please triage/prioritize this issue?
We talked to about this a while ago with Moreh during a f2f. Moreh said to come back on the exact use case. If 8 tiles registers are used (half dst mode, valid indicies are 0-7), which means we can't index into 8-15. We would need full dst mode for this.
fyi @chekangliang
thanks for the reminder. Right - will assign this @razorback3 to await Moreh feedback.
Also, this might not be a bug as this is outside of supported behavior. Will remove it the bug tag and replace with feature request tag.
Closing the issue because it is not actually needed right now.
Describe the bug Even though the API document states that idst argument in copy_tile can be ess than the size of the DST register (16), it doesn't seem to copy tile to index properly from 7 to 15.
To Reproduce You can reproduce the problem by slightly modifying the test code running on WH b0. This patch make the compute kernel to use dst register from 8 - 11.
Expected behavior
Screenshots
Please complete the following environment information:
Additional context Considering that the existing code such bmm op only uses dst registers up to 8 - SUBBLOCK_HW_CHOICES array in bmm_op.hpp - , it seems like this issue is already known. Or, if there are any documents or guides I missed, I would appreciate it if you could let me know.