huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.94k stars 967 forks source link

How can I debug on vscode #400

Closed jarork closed 2 years ago

jarork commented 2 years ago

I am a new user of accelerate. How should I configure VSCode in order to debug a program with accelerate? (E.g. accelerate launch train.py)

yuxinyuan commented 2 years ago

Just create a launch.json that looks like below.

{
  "name": "train",
  "type": "python",
  "request": "launch",
  "module": "accelerate.commands.launch",
  "args": ["train.py"], // other args comes after train.py
  "console": "integratedTerminal",
  // "env": {"CUDA_LAUNCH_BLOCKING": "1"}
},
github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

hbwu-ntu commented 2 years ago

Hi @yuxinyuan , how can I specify other args after train.py. Could you give me an example?

Shoud it be like { ... "args": ["train.py"], ["--arg"] ... }

yuxinyuan commented 2 years ago

Hi @yuxinyuan , how can I specify other args after train.py. Could you give me an example?

Shoud it be like { ... "args": ["train.py"], ["--arg"] ... }

It should be sth like "args": ["train.py", "--flag1", "--arg1=hello"]

odellus commented 1 year ago

I'm trying to figure out how to debug processes pdsh kicks off on my second node (see #1114) with vs code or pdb, anything really. Anyone have any advice?

htang2012 commented 1 year ago

this is from the url: https://huggingface.co/docs/accelerate/usage_guides/megatron_lm, how specify the vscode launch.json file?

accelerate launch --config_file megatron_gpt_config.yaml \ examples/by_feature/megatron_lm_gpt_pretraining.py \ --config_name "gpt2-large" \ --tokenizer_name "gpt2-large" \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --block_size 1024 \ --learning_rate 5e-5 \ --per_device_train_batch_size 24 \ --per_device_eval_batch_size 24 \ --num_train_epochs 5 \ --with_tracking \ --report_to "wandb" \ --output_dir "awesome_model"

huseyinatahaninan commented 1 year ago
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "module": "accelerate.commands.launch",
            "args": [
                "--config_file", "megatron_gpt_config.yaml",
                "./examples/by_feature/megatron_lm_gpt_pretraining.py",
                "--config_name ", "gpt2-large",
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}
huseyin-karaca commented 1 year ago

Hi, could you also explain how to specify arguments of accelerate launch, like --gpu_ids, please? (In other words, is it possible to configure a launch.json file representing CLI commands like accelerate launch --gpu_ids 1 main.py --batch_size 512 --epoch 1000 ?)

huseyinatahaninan commented 1 year ago

I think for those args they'd come before main.py -- perhaps you can try the following and see if it works?

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "module": "accelerate.commands.launch",
            "args": [
                "--gpu_ids", "1",
                "main.py",
                "--batch_size", "512",
                "--epoch", "1000"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}
huseyin-karaca commented 1 year ago

Yes, it works. Thank you!

sonsus commented 9 months ago

None of above launch.json configuration properly works for me. It pretends to run, but I get unresponsive hanging w/o any terminal output nor gpu usage. Any possible reason for this (e.g. wrongly configured dataloader)?

huseyinatahaninan commented 9 months ago

Make sure to specify the GPUs and be careful with the file name path with respect to the .vscode folder. Something like below should work hopefully:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "env": {"CUDA_VISIBLE_DEVICES":"0,1"},
            "module": "accelerate.commands.launch",
            "args": [
                "--multi_gpu",
                "--num_processes", "2",
                "./PATH/main.py",
                "--model_name_or_path", "HuggingFaceH4/zephyr-7b-beta",
                "--seed", "42",
               etc. etc. etc.
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}
euminds commented 9 months ago

I think for those args they'd come before main.py -- perhaps you can try the following and see if it works?

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "module": "accelerate.commands.launch",
            "args": [
                "--gpu_ids", "1",
                "main.py",
                "--batch_size", "512",
                "--epoch", "1000"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}

I still encounter the ModuleNotFoundError, /home/user/miniconda3/envs/ziplora/bin/python: Error while finding module specification for 'accelerate.command.launch' (ModuleNotFoundError: No module named 'accelerate.command')

{ "version": "0.2.0", "configurations": [ { "name": "Python: Current File", "type": "debugpy", "request": "launch", "env": {"CUDA_VISIBLE_DEVICES":"0"}, "module": "accelerate.command.launch", "args": [ "--pretrained_model_name_or_path","CompVis/stable-diffusion-v1-4", "--train_data_dir","assets/cat_statue", "--placeholder_token","", "--initializer_token","toy", "--resolution","512", "--train_batch_size","1", "--gradient_accumulation_steps","8", "--max_train_steps","500", "--learning_rate","0.005", "--lr_scheduler","constant", "--lr_warmup_steps","0", "--output_dir","xti_cat", "--only_save_embeds", "--enable_xformers_memory_efficient_attention", ], "console": "integratedTerminal" } ] }

muellerzr commented 9 months ago

This should be commands, not command

"module": "accelerate.commands.launch"

ppeterpp commented 7 months ago

what if I have this in my original shellscript, --dataset_name=$DATASET_PATH \, how should it be in the launch.js?

huseyinatahaninan commented 7 months ago

checkout my example above where you can see for instance "--seed", "42", so you can similarly add "--dataset_name", "DATASET_PATH",

Monohydroxides commented 4 months ago

None of above launch.json configuration properly works for me. It pretends to run, but I get unresponsive hanging w/o any terminal output nor gpu usage. Any possible reason for this (e.g. wrongly configured dataloader)?

Same issue, when I try to debug accelerate, the code accelerator = Accelerator() hangs, any solution to this?