tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
396 stars 48 forks source link

Test different data patterns for failing tests #11288

Closed ttmtrajkovic closed 3 days ago

ttmtrajkovic commented 1 month ago

For failing 1D matmul test (LM Head) -> convert to 2D matmul by changing op parameters and keep amount of compute and possibly data movement per core identical.

For failing 2D matmul test (FF1, no gelu) -> convert to 1D matmul by changing op parameters and keep amount of compute and possibly data movement per core identical.

Run tests with 8.10.0.0 FW bundle. Consider running with FW versions without LL. TBD.

pavlepopovic commented 1 month ago

LM Head test: Managed to convert it from being 1D matmul to a 2D matmul. Memory configs are preserved (in0 L1 interleaved, in1 dram interleaved, output L1 interleaved) Data formats are preserved In order to preserve per_core_M, per_core_N, and in0_block_w, and num_cores == 64 (these 3 represent same compute on a tensix core), and for this to be possible to be a 2D matmul, the dims of the inputs needed to change: in0: [1, 1, 32, 4544] -> [1, 1, 256, 4544] in1: [1, 1, 4544, 65024] -> [1, 1, 4544, 8192]

Based on preliminary testing (on one machine so far): the behaviour of these two tests (original LM head and modified) seems similar (both cause hangs on the same chip) todo more testing:

FF1 no gelu: Managed to convert it from 2D matmul to 1D matmul. Data formats are preserved. in1 memory config is preserved (dram interleaved) in0, and out memory configs needed to be changed from BLOCK_SHARDED to HEIGHT_SHARDED, as matmul 1d doesn't support block sharding. In order to preserve per_core_M, per_core_N, and in0_blocK_w, and num_cores == 64 (these 3 represent same compute on a tensix core), and for this to be possible to be a 1D matmul, the dims of the inputs needed to change: in0: [1, 1, 1024, 4608] -> [1, 1, 8192, 576] in1: [1, 1, 4608, 18432] -> [1, 1, 576, 2304]

Based on preliminary testing (on one machine so far): the behaviour of these two tests (original FF1 + no gelu and modified) seems similar (both cause hangs on the same chip) todo more testing:

FF1 + gelu: Same modifications needed as FF1 no gelu, except that we have gelu added on top.

Based on preliminary testing (on one machine so far): neither tests (original FF1 + gelu and modified) hang on the machine where I tested, todo more testing:

pavlepopovic commented 3 weeks ago

Switching matmuls to 1d form 2d and vice versa did not show any results from which we can draw conclusions. Full results here: https://docs.google.com/spreadsheets/d/10uWtBEkLLEM-h5TuuGQ6HW8AjhXwSjFV-cK3iVFYYIU/edit?gid=881936187#gid=881936187