Using block sharded config with height < 32 leads to low PCC on isfinite and similar ops. Provided unit test gives multiple configurations where we observe problem. Some of them are:
Steps to reproduce
Checkout ngrujic/unit-tests-1branch (soon to be merged into main). You can run unit test which can showcase this problem with this command:
Using block sharded config with height < 32 leads to low PCC on isfinite and similar ops. Provided unit test gives multiple configurations where we observe problem. Some of them are:
Used grid size is: (8, 8).
Ops found affected:
Also problem is observed when no op is called (just copy tensor back and forth).
Example shapes that lead to problem can be: