microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.44k stars 4.11k forks source link

[BUG] "Setting ds_accelerator" uses `print` not `logger` #3680

Closed muellerzr closed 1 year ago

muellerzr commented 1 year ago

Describe the bug When accelerate tries to run any command while DeepSpeed is available, we get "Setting ds_accelerator ...", which while not explicitly bad, is making a test fail due to a non-clean CLI print when non-deepspeed code is used (we have a check for is_deepspeed_available() which is probably triggering this.

As a nice QOL, it would be good to have get_accelerator (https://github.com/microsoft/DeepSpeed/blob/master/accelerator/real_accelerator.py#L102) and set_accelerator (https://github.com/microsoft/DeepSpeed/blob/master/accelerator/real_accelerator.py#L109) use the already-existing logger in the framework if possible, so that we can disable these annoying prints when they're not needed :)

To Reproduce Steps to reproduce the behavior:

  1. pip install accelerate deepspeed -U
  2. Create a file with:
    
    from accelerate.commands.tpu import tpu_command_parser, tpu_command_launcher

parser = tpu_command_parser() args = parser.parse_args([ "--config_file", "tests/test_configs/latest.yaml", "--install_accelerate", "--debug" ])

tpu_command_launcher(args)

3. Run `python {my_file_name.py}`
4. Should print:
```bash
Setting ds_accelerator to cuda (auto detect)
Running gcloud compute tpus tpu-vm ssh test-tpu --zone us-central1-a --command cd /usr/share; pip install accelerate -U; echo "hello world"; echo "this is a second command" --worker all

Expected behavior

A configurable option to silence these print statements by having them run through the logging system instead, perhaps as an info or as a debug.

tjruwase commented 1 year ago

@muellerzr, thanks for reporting this error. This is definitely an oversight. Are you able to submit a PR that uses logger instead of print?

muellerzr commented 1 year ago

@tjruwase sadly don't have the bandwidth/familiarity with the code right now to do so, would appreciate it if someone else could pick it up 🙏

tjruwase commented 1 year ago

No worries. Thanks for reporting the error. I will link a PR asap.

tjruwase commented 1 year ago

By the way, are you actually running with tpus? The accelerator abstraction is a fairly new feature to make it easier to run DeepSpeed on arbitrary hardware, and so we appreciate any feedback. Thanks!

muellerzr commented 1 year ago

@tjruwase nope, not with DeepSpeed yet (@pacman100 on that one with Accelerate?) it's on a GPU runner, just our CLI picks up on it due to import (which was also a great way to know that something weird was happening 😄 )

tjruwase commented 1 year ago

@muellerzr, could please get the linked PR a try? Thanks!

vipcxj commented 1 year ago

@tjruwase I have update to 0.94. And There are still too much message "[real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)" printed in std io.

muellerzr commented 1 year ago

@tjruwase let's reopen this please, because this still isn't the right solution, since it's always printed to the user as @vipcxj states

vipcxj commented 1 year ago

I found no annoying messages printed when I am not distributed training. However when I spawn the processes to train the model with multi gpu, the annoying messages come back. I don’t know what the two are different, because the codes are the same.

spawn(train_and_test, args=(2,), nprocs=2)

vs

train_and_test()
king159 commented 1 year ago

The bug still exists with version 0.10.2

tjruwase commented 1 year ago

@tjruwase let's reopen this please, because this still isn't the right solution, since it's always printed to the user as @vipcxj states

Sorry, I missed this earlier.

tjruwase commented 1 year ago

The bug still exists with version 0.10.2

@king159, @vipcxj, can you please share a simple repro?

muellerzr commented 1 year ago

@tjruwase:

import deepspeed

python myfile.py

tjruwase commented 1 year ago

@muellerzr, can you please try https://github.com/microsoft/DeepSpeed/tree/olruwase/ds_3680_2? The issue seems to be that our logger is using default log level info. This hack manually uses debug level.

@king159, @vipcxj FYI

Honesty-of-the-Cavernous-Tissue commented 1 year ago

@muellerzr, can you please try https://github.com/microsoft/DeepSpeed/tree/olruwase/ds_3680_2? The issue seems to be that our logger is using default log level info. This hack manually uses debug level.

@king159, @vipcxj FYI

This works fine for me

Honesty-of-the-Cavernous-Tissue commented 1 year ago

@muellerzr, can you please try https://github.com/microsoft/DeepSpeed/tree/olruwase/ds_3680_2? The issue seems to be that our logger is using default log level info. This hack manually uses debug level.

@king159, @vipcxj FYI

would you plz check where this message come from, i'm using deepspeed by huggingface accelerate, so, it's difficult to find out if this from deepspeed image

tjruwase commented 1 year ago

@Honesty-of-the-Cavernous-Tissue, that message is from DeepSpeed. Should be fixed by #4310.

Honesty-of-the-Cavernous-Tissue commented 1 year ago

@Honesty-of-the-Cavernous-Tissue, that message is from DeepSpeed. Should be fixed by #4310.

thanks, it's bother me for a long time

muellerzr commented 1 year ago

Both fixes quiet everything down for me @tjruwase 🙏

tjruwase commented 1 year ago

@muellerzr, thanks for the confirmation. I think this behavior needs to be configurable, so I will work a bit more on the PR before merging.

muellerzr commented 1 year ago

Got it, thanks @tjruwase. Definitely waiting with anticipation.