Using distributed or parallel set-up in script?: yes
Using GPU in script?: yes
GPU type: NVIDIA A100 80GB PCIe
Who can help?
Hi, @muellerzr @SunMarc
Hi,
I have been trying to fine-tune OPT-model 1.3b on a subset of allenai/ai2_arc dataset ('./data/10_low_p3-opt-125M.arrow' in the code )using 4 GPUs. The code works fine if I use the complete data (train_dataset = load_dataset("allenai/ai2_arc", "ARC-Challenge")[split] ) but when I try to train on a subset (train_dataset = load_from_disk('./data/10_low_p3-opt-125M.arrow')), the optimizer step gives the following error:
File line 169, in
trainer.train()
File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 2341, in _inner_training_loop
self.optimizer.step()
File "python39/lib/python3.9/site-packages/accelerate/optimizer.py", line 172, in step
self.optimizer.step(closure)
File "python39/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, args, kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 184, in step
adamw(
File "/net/scratch/lcpandia/python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 335, in adamw
func(
File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 509, in _multi_tensor_adamw
grouped_tensors = Optimizer._group_tensors_by_device_and_dtype([
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 397, in _group_tensors_by_device_and_dtype
return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
File "python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "python39/lib/python3.9/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype
torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items()
RuntimeError: Tensors of the same index must be on the same device and the same dtype except step tensors that can be CPU and float32 notwithstanding**
My minimal code to reproduce it is attached as a zip file:
System Info
transformers
version: 4.44.2Who can help?
Hi, @muellerzr @SunMarc Hi, I have been trying to fine-tune OPT-model 1.3b on a subset of allenai/ai2_arc dataset ('./data/10_low_p3-opt-125M.arrow' in the code )using 4 GPUs. The code works fine if I use the complete data (train_dataset = load_dataset("allenai/ai2_arc", "ARC-Challenge")[split] ) but when I try to train on a subset (train_dataset = load_from_disk('./data/10_low_p3-opt-125M.arrow')), the optimizer step gives the following error: File line 169, in
trainer.train()
File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 2341, in _inner_training_loop
self.optimizer.step()
File "python39/lib/python3.9/site-packages/accelerate/optimizer.py", line 172, in step
self.optimizer.step(closure)
File "python39/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, args, kwargs)
File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 184, in step
adamw(
File "/net/scratch/lcpandia/python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 335, in adamw
func(
File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 509, in _multi_tensor_adamw
grouped_tensors = Optimizer._group_tensors_by_device_and_dtype([
File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 397, in _group_tensors_by_device_and_dtype
return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
File "python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "python39/lib/python3.9/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype
torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items()
RuntimeError: Tensors of the same index must be on the same device and the same dtype except
step
tensors that can be CPU and float32 notwithstanding** My minimal code to reproduce it is attached as a zip file:testTrainerDeviceIssue.zip
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
testTrainerDeviceIssue.zip Attached as a zip file torchrun --nproc_per_node=4 --master_port=<> testTrainerDeviceIssue.py \ --model_name_or_path facebook/opt-1.3b \ --data_path 10_low_p0-opt-125M.arrow \ --bf16 True \ --output_dir / \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \ --tf32 True \ --seed 42 \ --gradient_checkpointing True
Expected behavior
The model training should work as it runs in case of directly using allenai/ai2_arc ARC-Challenge dataset