Closed asolano closed 12 months ago
Hi @asolano - good catch, I believe the original intention was +=
and was lost when the code was moved to the other file. It seems like you've already tested this, but I've made a PR to correct the behavior here: https://github.com/microsoft/DeepSpeed/pull/4373
Describe the bug When using the
pdsh
launcher, the runner replaces any existing content of the PDSH_SSH_ARGS_APPEND environment variable, instead of adding to it. As a result, any existing values the user has set in the shell will be ignored.Source:
https://github.com/microsoft/DeepSpeed/blob/78c3b148a8a8b6e60ab77a5c75849961f52b143d/deepspeed/launcher/multinode_runner.py#L69
To Reproduce
deepspeed
, i.e,export PDSH_SSH_ARGS_APPEND="-vv"
for extra verbose logs.Expected behavior The existing value of PDSH_SSH_ARGS_APPEND is preserved (and added to if necessary). In the example setup before, the ssh commands execute with verbosity.
After manually changing the
=
to a+=
in the source code:ds_report output
Screenshots N/A
System info (please complete the following information):
Launcher context Launching with the DeepSpeed launcher:
Docker context N/A
Additional context
On the original pull request (https://github.com/microsoft/DeepSpeed/pull/4117), the code was a "+=" instead of a "=", so maybe that was the original intention. Alternatively, if the environment variables set in the shell that launches
deepspeed
are not supposed to be preserved, indicate so in the documentation.