huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.03k stars 26.55k forks source link

Different behavior observed of trainer class on using allenai/ai2_arc dataset and own data which is a subset of allenai/ai2_arc #33788

Open LalchandPandia opened 1 week ago

LalchandPandia commented 1 week ago

System Info

Who can help?

Hi, @muellerzr @SunMarc Hi, I have been trying to fine-tune OPT-model 1.3b on a subset of allenai/ai2_arc dataset ('./data/10_low_p3-opt-125M.arrow' in the code )using 4 GPUs. The code works fine if I use the complete data (train_dataset = load_dataset("allenai/ai2_arc", "ARC-Challenge")[split] ) but when I try to train on a subset (train_dataset = load_from_disk('./data/10_low_p3-opt-125M.arrow')), the optimizer step gives the following error: File line 169, in trainer.train() File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 1938, in train return inner_training_loop( File "python39/lib/python3.9/site-packages/transformers/trainer.py", line 2341, in _inner_training_loop self.optimizer.step() File "python39/lib/python3.9/site-packages/accelerate/optimizer.py", line 172, in step self.optimizer.step(closure) File "python39/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper return wrapped(*args, kwargs) File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, *kwargs) File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad ret = func(self, args, kwargs) File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 184, in step adamw( File "/net/scratch/lcpandia/python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 335, in adamw func( File "python39/lib/python3.9/site-packages/torch/optim/adamw.py", line 509, in _multi_tensor_adamw grouped_tensors = Optimizer._group_tensors_by_device_and_dtype([ File "python39/lib/python3.9/site-packages/torch/optim/optimizer.py", line 397, in _group_tensors_by_device_and_dtype return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices) File "python39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "python39/lib/python3.9/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items() RuntimeError: Tensors of the same index must be on the same device and the same dtype except step tensors that can be CPU and float32 notwithstanding** My minimal code to reproduce it is attached as a zip file:

testTrainerDeviceIssue.zip

Information

Tasks

Reproduction

testTrainerDeviceIssue.zip Attached as a zip file torchrun --nproc_per_node=4 --master_port=<> testTrainerDeviceIssue.py \ --model_name_or_path facebook/opt-1.3b \ --data_path 10_low_p0-opt-125M.arrow \ --bf16 True \ --output_dir / \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \ --tf32 True \ --seed 42 \ --gradient_checkpointing True

Expected behavior

The model training should work as it runs in case of directly using allenai/ai2_arc ARC-Challenge dataset

LalchandPandia commented 1 week ago

10_low_p0-opt-125M.arrow.zip Attachin the data file