tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
463 stars 70 forks source link

Bad PCCs in matmul.full.matmul_default_block_sharded n_size_32 sweeps #12529

Closed bbradelTT closed 1 month ago

bbradelTT commented 1 month ago

After https://github.com/tenstorrent/tt-metal/pull/11520, the matmul.full.matmul_default_block_sharded n_size_32 sweeps stopped working properly.

Most of them hung. That was fixed as part of https://github.com/tenstorrent/tt-metal/issues/12220 with https://github.com/tenstorrent/tt-metal/pull/12462 that updates the matmul reader to not mcast when no destinations.

After that, there are a lot of PCC errors: image

Repro steps should be:

python3 tests/sweep_framework/parameter_generator.py --module-name matmul.full.matmul_default_block_sharded # if needed
python tests/sweep_framework/runner.py --module-name matmul.full.matmul_default_block_sharded --vector-id ttVS4pEBwvAQT7Jm2dzu 
bbradelTT commented 1 month ago

Note: the vector ids have changed because of a code update. Therefore the vector ids in the screen shot can no longer be used.

bbradelTT commented 1 month ago

It turns out that

                            // multicast to every core in receiver grid
                            noc_async_write_multicast_loopback_src(
                                    local_read_addr,
                                    in0_multicast_data_addr,
                                    in0_block_size_bytes,
                                    in0_mcast_num_cores,
                                    false,
                                    false);
                        }

needs to happen even when in0_mcast_num_cores is 1

Therefore the change in https://github.com/tenstorrent/tt-metal/pull/12462 needs to be undone and whatever is causing the hang needs to be identified. [Edit: Before making any code changes, we should understand what is going on and decide how to proceed.]

TT-BrianLiu commented 1 month ago

This fixes it: https://github.com/tenstorrent/tt-metal/pull/12796 We just need to use noc_async_write instead of noc_async_write_multicast_loopback_src (it hangs with linking set to True) for local CB copying.

image
TT-BrianLiu commented 1 month ago

Bad PCC was because we were using garbage data if we don't do the copy at all.