I used multiple GPUs for training, but there were the following errors. How can I solve them?

loubnabnl / santacoder-finetuning

Fine-tune SantaCoder for Code/Text Generation.

Apache License 2.0

186 stars 23 forks source link

lib/python3.10/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

python -m torch.distributed.launch \ --nproc_per_node number_of_gpus train.py \ --model_path="bigcode/santacoder" \ --dataset_name="bigcode/the-stack-dedup" \ --subset="data/shell" \ --data_column "content" \ --split="train" \ --seq_length 2048 \ --max_steps 30000 \ --batch_size 2 \ --gradient_accumulation_steps 8 \ --learning_rate 5e-5 \ --num_warmup_steps 500 \ --eval_freq 3000 \ --save_freq 3000 \ --log_freq 1 \ --num_workers="$(nproc)" \

loubnabnl / santacoder-finetuning

I used multiple GPUs for training, but there were the following errors. How can I solve them? #15