tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
415 stars 52 forks source link

Investigate possible hang in tt_lib falcon40b 4 chip decoder #7842

Open tt-rkim opened 5 months ago

tt-rkim commented 5 months ago

https://github.com/tenstorrent/tt-metal/actions/runs/8835020320/job/24261345302

likely was something that was checked in past couple of days because we were having trouble in multi-chip pipelines for a bit as this test was fine before.

@johanna-rock-tt @s-jovic could you please help us out?

tt-rkim commented 5 months ago

skipping for now

johanna-rock-tt commented 5 months ago

We have the same issue for the end_to_end test, I'll skip only the decode+4chip tests in CI so that we still cover the rest.

johanna-rock-tt commented 5 months ago

@TT-BrianLiu can we just drop support for 4 chip decode entirely and remove the test? 4 chips are not our target anymore since we have the 8 chip version and t3000 is never gonna be used for 4 chips as far as I understand. Prefill doesn't even support running on 4 chips.

TT-BrianLiu commented 5 months ago

Is 8 chip in CI? There's no reason 4 chip shouldn't work if 8 does.

johanna-rock-tt commented 5 months ago

Just talked to @uaydonat too, we can drop 4 chips now. I'll clean up our tests and also remove 4 chip configs from the model_config.

johanna-rock-tt commented 5 months ago

@TT-BrianLiu yes, 8 chips is in CI and working fine.

TT-BrianLiu commented 5 months ago

Ok sounds good