huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.25k stars 26.09k forks source link

Data type error while fine-tuning Deberta v3 Large using code provided #14375

Closed NIKHILDUGAR closed 2 years ago

NIKHILDUGAR commented 2 years ago

Environment info

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet ...): microsoft/deberta-v3-large

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

  1. go to transformers/examples/pytorch/text-classification/
  2. Run - python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/ or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers

Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting- Traceback (most recent call last): File "run_glue.py", line 568, in main() File "run_glue.py", line 486, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1316, in train tr_loss_step = self.training_step(model, inputs) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1867, in training_step loss.backward() File "/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py", line 352, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py", line 199, in apply return user_fn(self, *args) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 114, in backward inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor 0%|

I am also getting the same error when trying to train Deberta-v2

NIKHILDUGAR commented 2 years ago

The main issue is that run_glue.py is not usable on Deberta models probably cause they require torch arrays and not tensors. although I am not sure where is it getting tensors from though.

NielsRogge commented 2 years ago

cc'ing @BigBird01

LysandreJik commented 2 years ago

Hello @NIKHILDUGAR, thanks for opening an issue! I'm trying to get the same error as you but I'm failing at doing so: the training runs correctly.

I wonder if it isn't because you're on the bleeding edge with a PyTorch dev version? We recommend using a PyTorch stable release as those are heavily tested in our CI. Do you get the same error when using PyTorch 1.10?

NIKHILDUGAR commented 2 years ago

I can't test that at the moment as i am facing a few CUDA issues on my system but I think you are right.

LysandreJik commented 2 years ago

Okay, please let us know if we can help further.

amathews-amd commented 2 years ago

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

bestpredicts commented 2 years ago

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

this solve my probleam

arvind-nd commented 2 years ago

i got same error how can i avoid this error

NIKHILDUGAR commented 2 years ago

python run_glue.py --model_name_or_path microsoft/deberta-v3-large --task_name mnli --train_file snli --do_train --do_eval --evaluation_strategy epoch --max_seq_length 256 --warmup_steps 50 --per_device_train_batch_size 8 --learning_rate 6e-6 --num_train_epochs 2 --output_dir tmp/mnlilearn --overwrite_output_dir --logging_steps 30000 --save_total_limit 3 --save_strategy epoch --logging_dir tmp/mnlilearn

This code worked for me. I would recommend trying it for your own dataset and models.

arvind-nd commented 2 years ago

i am using kaggle kernel so do i need to run that command in kaggle kernel ?

SparkJiao commented 2 years ago

@arvind-nd Hi, you can change the code in modeling_deberta_v2.py:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L120

by either override the DebertaSelfAttention module or copy the script and then change it. This works for me.