Open ncvetkovicTT opened 4 days ago
@sjameelTT helped us in reducing the core grid size from 13x10 to 1x1 and making the test behave the same (fail is observed when more than 1 block per core is processed) - sjameel/transpose_tilize_bug, related to #14609
There are several known issues where using a tensor with dimensions [N, C, H, W] where N*C > 130 yields wrong resutls when executing tests in multicore mode.
The first issue where this problem was observed is this one - #14352. Here we see that for NCH_tiles > 130 the test fails, and also if W_tiles > 4. The first constraint is due to 13x10 core grid size, while the second is due to DEST register capacity and is probably an issue with compute/writer kernel rather than an LLK issue.
The second issue where we see the similar behavior is #14609. Here we first solve PACK/MATH syncrhonization issue, causing
test_tranpose_hw_rm_no_padding
to pass for tensor shapes other than the ones where N*C > 130, that problem persisted.The last issue where this limitation appears, but in a different form, is this one - #14594. Here we see an interesting behavior - the test fails only for N*C > 261, but even then there are some shapes that make the test pass. Here we can turn multicore mode on or off, and when it's off, all the shapes are processed correctly.
We need to understand what is the difference between a core processing multiple blocks in multicore mode, and processing different parts of the tensor in single core mode.
For all mentioned issues, the test were run on WH too and they pass there without core grid playing any part in the behavior.