huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.75k stars 26.95k forks source link

ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0'] #22171

Closed bestpredicts closed 1 year ago

bestpredicts commented 1 year ago

System Info

transformers version 4.7 , pytorch2.0, python3.9

run the example code in document of transformers

rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

error info

/nfs/v100-022/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
  File "/nfs/v100-022/run_clm.py", line 772, in <module>
    main()
  File "/nfs/v100-022/run_clm.py", line 406, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/nfs/v100-022//anaconda3/lib/python3.9/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses
    raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

Who can help?

No response

Information

Tasks

Reproduction

1.Install the following configuration environment: python 3.9 pytroch 2.1 dev trasnsformers 4.7

  1. then run code
    rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
    python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
    --model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
    --do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
  2. then you can get error. ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']

Expected behavior

1.Install the following configuration environment: python 3.9 pytroch 2.1 dev trasnsformers 4.7

  1. then run code
    rm -r /tmp/test-clm; CUDA_VISIBLE_DEVICES=0,1 \
    python -m torch.distributed.launch --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
    --model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
    --do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
  2. then you can get error. ValueError: Some specified arguments are not used by the HfArgumentParser: ['--local-rank=0']
amyeroberts commented 1 year ago

Hi @bestpredicts, thanks for raising this issue.

I can confirm that I see the same error with the most recent version of transformers and pytorch 2. I wasn't able to replicate the issue with pytorch 1.13.1 and the same transformers version.

Following the messages in the shared error output, if I set LOCAL_RANK in my environment and pass in --use-env I am able to run on pytorch 2.

LOCAL_RANK=0,1 CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node 2 --use-env examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200
sgugger commented 1 year ago

Also note that torch.distributed.launch is deprecated and torchrun is preferred in PyTorch 2.0.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

TXacs commented 1 year ago

Does anyone solved this problem? I got same problem when use torchrun or torch.distributed.launch, the self.local_rank is -1. my env is pytorch==2.0.0 and transorformers=4.30.1.

vejvarm commented 1 year ago

You might try migrating to torchrun? i.e.:

torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

for reference on migrating: https://pytorch.org/docs/stable/elastic/run.html

LiuZhihhxx commented 1 year ago

Have you solve your problems? I came up with the same error when using deepspeed. Solutions provided above didn't work at all. :(

PhenixZhang commented 1 year ago

另请注意,它torch.distributed.launch已被弃用,并且torchrun在 PyTorch 2.0 中是首选。

Thanks for this tip.

HeGaoYuan commented 1 year ago

watching

ZhaoChuyang commented 10 months ago

Print from sys.argv:

['train.py', '--local-rank=0', '--model_name_or_path', './checkpoints/vicuna-7b-v1.5', ...]

other arguments have the format 'key', 'value', but locak_rank is not properly parsed. In the above example, local_rank=0 is treated as a whole. I think this may be something wrong with torch.distributed.launch, since it appends local_rank=0 to the arguments list, but the appended argument can not be properly parsed by HFArgumentParser.

So use torchrun and use --use-env which uses environment variable LOCAL_RANK but not arguments --local_rank is an optional solution.

A hack fix can add this before parse_args_into_dataclasses()

import sys
for arg in sys.argv:
    if arg.startswith("--local-rank="):
        rank = arg.split("=")[1]
        sys.argv.remove(arg)
        sys.argv.append('--local_rank')
        sys.argv.append(rank)
bai-pei-wjsn commented 10 months ago

i have this problem

ValueError: Some specified arguments are not used by the HfArgumentParser: ['-f', '/root/.local/share/jupyter/runtime/kernel-8d0db21b-3ec1-4b17-987c-be497d81b3c5.json'] image

sqnian commented 10 months ago

You might try migrating to torchrun? i.e.:

torchrun --nproc_per_node 2 examples/pytorch/language-modeling/run_clm.py \
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 \
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --max_steps 200

for reference on migrating: https://pytorch.org/docs/stable/elastic/run.html

thanks, it is ok for me

bai-pei-wjsn commented 10 months ago

can it run on colab i can't do that

riyajatar37003 commented 5 months ago

ValueError: Some specified arguments are not used by the HfArgumentParser: ['--only_optimize_lora']

jiqibuaixuexi commented 2 weeks ago

I can run the following command in CMD without issues:

python run_show.py --output_dir output20241021 --model_name_or_path show_model/model001 --train_type use_lora --data_path data/AS_2022_train+test --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --num_train_epochs 5

However, when I try to debug in the VSCODE IDE, I encounter the following error:

ValueError: Some specified arguments are not used by the HfArgumentParser: ['model_name_or_path', 'show_model/model001', 'train_type', 'use_lora', 'data_path', 'data/AS_2022_train+test', 'per_device_train_batch_size', '1', 'per_device_eval_batch_size', '1', 'num_train_epochs', '5']

My JSON settings are as follows:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Current File",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "justMyCode": false,
            "args": [
                "--output_dir", "output20241021",
                "model_name_or_path", "show_model/model001",
                "train_type", "use_lora",
                "data_path", "data/AS_2022_train+test",
                "per_device_train_batch_size", "1",
                "per_device_eval_batch_size", "1",
                "num_train_epochs", "5"
            ]
        }
    ]
}