tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
429 stars 57 forks source link

New Matmul blocking API will hang if batch > 1 and bias is enabled #4279

Closed TT-BrianLiu closed 9 months ago

TT-BrianLiu commented 10 months ago

I believe we don't have a test where we have batch > 1 and bias enabled. This block of code:

image

in this compute kernel: tt_eager/tt_dnn/op_library/bmm/kernels/compute/bmm_large_block_zm_fused_bias_activation.cpp

seems like it should be mm_block_init_short. @tt-aho tested it and the exact change needed is probably:

mm_block_init_short(in0_cb_id, in1_cb_id);
PACK(( llk_pack_init<false, false, DstTileFaceLayout::ColMajor>()  ));
// PACK(( llk_pack_dest_init<SYNC, DstTileFaceLayout::ColMajor, false>()  ));
PACK(( llk_init_packer_dest_offset_registers<SyncHalf,DstTileFaceLayout::ColMajor,false>()  ));

For some reason, the only line of code where we needed to switch between tilize and matmul will cause PCC issues if it's not commented out. Not sure if there's a reason for this...

Similar issues in the conv compute kernel, tt_eager/tt_dnn/op_library/conv/kernels/conv_bmm_tilize_col_major_out_blocks.cpp:

image

It's difficult to test the conv compute since the conv that would go through this path are the convs where we pre-tilize. These convs don't actually support height blocking yet. Likely, you can just port over whatever change is needed for the matmul compute.

tt-aho commented 10 months ago

I pushed the test and potential fix to aho//matmul_block_batch. You can try out changing the different llk setups/calls there

pytest tests/tt_eager/python_api_testing/unit_testing/test_matmul.py

yugaoTT commented 10 months ago

Thanks @tt-aho, I tried a few setups and it works, I'll cherry-pick your commits and create PR