Closed muellerzr closed 3 months ago
Will be doing similar benchmarks and fixes over in the transformers
side, while we wait for the torch team to get back with a clear answer
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Amazing work, Zach!
cc @nvassilyev, as I now know your github :)
This PR seems to be related, as it rolls back the value to 0.1: https://github.com/pytorch/pytorch/pull/124692 (note that the PR was merged, even though the status is "Closed").
Great investigation @BenjaminBossan, looking through the diffs we can indeed see our 5
!
Merging as all is well
What does this PR do?
@stas00 and Nikita (sorry don't know your gh handle!) and I were investigating why
accelerate launch
is so slow during tests.Turns out pytorch isn't respecting the
monitor_interval
default of 0.1 for some reason and it's being set to 5 seconds instead. So, magically do that and thetest_multigpu.py
is sped up by 30% (33.40s
->23.29s
) andtest_core
was sped up by 25% (102.23s (0:01:42)
->76.94s (0:01:16)
Fixes # (issue)
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@SunMarc @stas00