tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
421 stars 54 forks source link

Falcon7b t3000 demo tests failing sporadically #9876

Closed pavlepopovic closed 2 months ago

pavlepopovic commented 3 months ago

In the last 5 runs of T3K demo pipeline on main (https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml), falcon7b tests have failed 2 times due to timing out. They timed out on calculating perplexity, and throw this message:

2024-06-29 08:03:24.368 | INFO | conftest:run_timer:512 - Timing out test case

Passing runs took ~25 min to complete, while failing ones ran for ~29 mins before timing out.

pavlepopovic commented 3 months ago

@skhorasganiTT I’ve Increased the timeout so the tests stop failing

skhorasganiTT commented 2 months ago

The last few runs in the pipeline (latest: https://github.com/tenstorrent/tt-metal/actions/runs/9886738452/job/27307454598) have completed the perplexity tests in < 15 min. Other pipelines were also seeing timeouts last week so I think this is safe to close for now. We can update the timeout values once we move the perplexity tests to the new correctness/stress-test pipeline that is being added.