Data type error while fine-tuning Deberta v3 Large using code provided

NIKHILDUGAR commented 2 years ago

Environment info

transformers version: 4.13.0.dev0
Platform: Ubuntu 18.04
Python version: Python 3.6.9
PyTorch version (GPU?): 1.11.0.dev20211110+cu111
Tensorflow version (GPU?): 2.6.2
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik

Information

Model I am using (Bert, XLNet ...): microsoft/deberta-v3-large

The problem arises when using:

[x] the official example scripts: (give details below): https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers
[] my own modified scripts: (give details below)

The tasks I am working on is:

[x] an official GLUE/SQUaD task: (give the name) mnli
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

go to transformers/examples/pytorch/text-classification/
Run - python3 run_glue.py --model_name_o r_path microsoft/deberta-v3-large --task_name mnli --do_train --do_eval --evaluation_strategy steps --max_seq_length 25 6 --warmup_steps 50 --learning_rate 6e-5 --num_train_epochs 3 --output_dir outputv3 --overwrite_output_dir --logging_ steps 10000 --logging_dir outputv3/ or run the script given in the model card - https://huggingface.co/microsoft/deberta-v3-large#fine-tuning-with-hf-transformers

Expected behavior

Training of microsoft/deberta-v3-large on the mnli dataset.

The error I am getting- Traceback (most recent call last): File "run_glue.py", line 568, in main() File "run_glue.py", line 486, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1316, in train tr_loss_step = self.training_step(model, inputs) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1867, in training_step loss.backward() File "/home/nikhil/.local/lib/python3.6/site-packages/torch/_tensor.py", line 352, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass File "/home/nikhil/.local/lib/python3.6/site-packages/torch/autograd/function.py", line 199, in apply return user_fn(self, *args) File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 114, in backward inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) TypeError: _softmax_backward_data(): argument 'input_dtype' (position 4) must be torch.dtype, not Tensor 0%|

I am also getting the same error when trying to train Deberta-v2

NIKHILDUGAR commented 2 years ago

The main issue is that run_glue.py is not usable on Deberta models probably cause they require torch arrays and not tensors. although I am not sure where is it getting tensors from though.

NielsRogge commented 2 years ago

cc'ing @BigBird01

LysandreJik commented 2 years ago

Hello @NIKHILDUGAR, thanks for opening an issue! I'm trying to get the same error as you but I'm failing at doing so: the training runs correctly.

I wonder if it isn't because you're on the bleeding edge with a PyTorch dev version? We recommend using a PyTorch stable release as those are heavily tested in our CI. Do you get the same error when using PyTorch 1.10?

NIKHILDUGAR commented 2 years ago

I can't test that at the moment as i am facing a few CUDA issues on my system but I think you are right.

LysandreJik commented 2 years ago

Okay, please let us know if we can help further.

amathews-amd commented 2 years ago

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

bestpredicts commented 2 years ago

Fourth argument of _softmax_backward_data is now torch.dtype.

https://github.com/pytorch/pytorch/blob/a34d2849cd3d39c2ce912402bfd90aea75162d1f/tools/autograd/derivatives.yaml#L1852

Changing inputGrad = _softmax_backward_data(grad_output, output, self.dim, output) to inputGrad = _softmax_backward_data(grad_output, output, self.dim, output.dtype) seems to work.

this solve my probleam

arvind-nd commented 2 years ago

i got same error how can i avoid this error

NIKHILDUGAR commented 2 years ago

python run_glue.py --model_name_or_path microsoft/deberta-v3-large --task_name mnli --train_file snli --do_train --do_eval --evaluation_strategy epoch --max_seq_length 256 --warmup_steps 50 --per_device_train_batch_size 8 --learning_rate 6e-6 --num_train_epochs 2 --output_dir tmp/mnlilearn --overwrite_output_dir --logging_steps 30000 --save_total_limit 3 --save_strategy epoch --logging_dir tmp/mnlilearn

This code worked for me. I would recommend trying it for your own dataset and models.

arvind-nd commented 2 years ago

i am using kaggle kernel so do i need to run that command in kaggle kernel ?

SparkJiao commented 2 years ago

@arvind-nd Hi, you can change the code in modeling_deberta_v2.py:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L120

by either override the DebertaSelfAttention module or copy the script and then change it. This works for me.

huggingface / transformers