Closed tt-rkim closed 5 months ago
@tt-rkim Just to clarify — do you mean the timeout issue above for the GS device perf job? I don't see any slowdowns on the N300 WH job. Both took 17 mins and 50ish sec to run and the perf report is very close, prob within the measuring error margin.
Yes just the GS device perf job. Which is super weird because the only thing that can explain Still, not sure what would cause such a difference... not sure if we see this with all the models. So perhaps certain ops cause a slowdown. For ex. we don't run bert perf on WH
The re-run on the commit before libc++ passed. Running again
Trying to reproduce locally as well.
Post commit profiler is also lengthened. @tt-rkim can you think of any other cI job that runs pytest on GS?
Ok so tt_metal ReadFromDevice
went from ~2-3s on each call to ~17-18s.
Before:
After:
Device perf regression calls it to get profiling data from DRAM that is why many other tests for GS are fine.
I would say we up the timeout on Device Perf CI for now and start a separate ticket on the ReadFromDevice
regression.
Sounds good, will increase timeout
Did you see what the underlying call times are like between the two?
I was trying Bert and the run went from ~1min to 1min 30 seconds. So all of the time increase is coming from ReadFromDevice
Is that tracy? Is it showing anything underneath?
Oh I see, no, no further child calls are recorded.
A bit more info on this,
So the elongation is coming directly from umd read_from_device
calls.
@mo-tenstorrent we can close?
Yes, #9516 fixed this
Before libc++: https://github.com/tenstorrent/tt-metal/actions/runs/9521692385
After libc++: https://github.com/tenstorrent/tt-metal/actions/runs/9530903488
Looks like a lot of tests lengthened in time... almost double.
We need to investigate if libc++ is the culprit. These two runs should be 1 commit apart.
cc: @mo-tenstorrent @yan-zaretskiy @TT-billteng @vtangTT