Closed mvpatel2000 closed 1 week ago
Any way we can do a test for this?
Any way we can do a test for this?
Unfortunately not easily since pytest runs in a single instance and this would require a separate launch. But ill include a manual test
can someone put a request changes hold to block during codefreeze? @dakinggg ?
I'm curious, why set env_var WORLD_SIZE
for single node in the first place? It breaks the CPU test python -m pytest
in interactive if there are mutliple GPUs. Error is missing RANK
in env_var. This PR doesn't fix that, because here it's not 1. I had to unset the WORLD_SIZE
I'm curious, why set env_var
WORLD_SIZE
for single node in the first place?
was bug in mcloud, it should set all env vars even if on single node
It breaks the CPU test
python -m pytest
will investigate
It breaks the CPU test
python -m pytest
will investigate
Discussed offline, if you dont use the launcher it will still use the WORLD_SIZE env var which we cannot fix. This break is not related to this PR, you must use composer -n 1 pytest...
What does this PR do?
Simplify launcher world size parsing. Now,
composer -n
correctly enables running on fewer GPUs when iterating on a single machine.