Fine-Tune Wav2Vec2 for English ASR on GCP: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

sully90 commented 2 years ago

When trying to run the Notebook https://github.com/huggingface/blog/blob/main/notebooks/17_fine_tune_wav2vec2_for_english_asr.ipynb on a GCP Notebook instance I get the below error when calling trainer.train():

***** Running training *****
  Num examples = 4620
  Num Epochs = 30
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 4350
/opt/conda/lib/python3.7/site-packages/transformers/feature_extraction_utils.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  tensor = as_tensor(value)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_15719/4032920361.py in <module>
----> 1 trainer.train()

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1314                         tr_loss_step = self.training_step(model, inputs)
   1315                 else:
-> 1316                     tr_loss_step = self.training_step(model, inputs)
   1317 
   1318                 if (

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   1845         if self.use_amp:
   1846             with autocast():
-> 1847                 loss = self.compute_loss(model, inputs)
   1848         else:
   1849             loss = self.compute_loss(model, inputs)

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1879         else:
   1880             labels = None
-> 1881         outputs = model(**inputs)
   1882         # Save past state if it exists
   1883         # TODO: this needs to be fixed and made cleaner later.

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, output_attentions, output_hidden_states, return_dict, labels)
   1497             output_attentions=output_attentions,
   1498             output_hidden_states=output_hidden_states,
-> 1499             return_dict=return_dict,
   1500         )
   1501 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, mask_time_indices, output_attentions, output_hidden_states, return_dict)
   1062         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1063 
-> 1064         extract_features = self.feature_extractor(input_values)
   1065         extract_features = extract_features.transpose(1, 2)
   1066 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values)
    335         hidden_states = input_values[:, None]
    336         for conv_layer in self.conv_layers:
--> 337             hidden_states = conv_layer(hidden_states)
    338 
    339         return hidden_states

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, hidden_states)
    256 
    257     def forward(self, hidden_states):
--> 258         hidden_states = self.conv(hidden_states)
    259         hidden_states = self.layer_norm(hidden_states)
    260         hidden_states = self.activation(hidden_states)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    300 
    301     def forward(self, input: Tensor) -> Tensor:
--> 302         return self._conv_forward(input, self.weight, self.bias)
    303 
    304 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    297                             _single(0), self.dilation, self.groups)
    298         return F.conv1d(input, weight, bias, self.stride,
--> 299                         self.padding, self.dilation, self.groups)
    300 
    301     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

CUDA is enabled and model is successfully loaded onto the GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P0    27W /  70W |   1370MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     15719      C   /opt/conda/bin/python            1367MiB |
+-----------------------------------------------------------------------------+

Appreciate any help!

osanseviero commented 2 years ago

cc @patrickvonplaten

patrickvonplaten commented 2 years ago

Hey @sully90,

The notebook works on the colab for me, but I haven't tested it on GCP.

From the error message, it looks like there is a problem with fp16 - could you in a first step maybe try to disable fp16? E.g. remove the:

fp16=True.

statement?

elites2k19 commented 2 years ago

Converted notebook to .py file I am also facing same issue. Tried removing fp16= True but the issue persists. @patrickvonplaten plzz help to solve this issue

elites2k19 commented 2 years ago

@sully90 Did you solve this issue?

patrickvonplaten commented 2 years ago

Could you guys make me a reproducible colab so that I can reproduce the error? :-) This would be great!

ericjohansson91 commented 2 years ago

I have the same issue. It worked a couple of days ago with no changes done to the code.

ghost commented 2 years ago

Getting same issue on Colab without any changes to notebook -- ie: issue on original notebook. Sharing notebook https://colab.research.google.com/drive/18uGFjmoTVEKDI-2Nd9kgoSwQG4H-0Pzx?usp=sharing

Jesse-Parvess commented 2 years ago

Hey Getting the same issue when running the standard notebook:

https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb#scrollTo=_UEjJqGsQw24

Please assist

patrickvonplaten commented 2 years ago

I can reproduce now! Thanks for telling me!

patrickvonplaten commented 2 years ago

Not 100% sure what the error is for now- will take a look tomorrow!

patrickvonplaten commented 2 years ago

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

ghost commented 2 years ago

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

Works now; thanks so much @patrickvonplaten; your contribution to the open source asr community is outstanding

jovan3600 commented 2 years ago

Should be fixed now: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb

Can you try it out ? :-)

Hey, could explain exactly what the problem was? I suddenly received the same error with no changes to the code, so I am wondering if the same changes were reproducible for my wav2vec project as well. I'm not sure if this was caused by a recent update.

patrickvonplaten commented 2 years ago

The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?

jovan3600 commented 2 years ago

The problem was that the Transformers version that was used was too old. Didn't dive super deep into it though. Maybe updating your Transformers version should do the trick @jovan3600 ?

Yeah I tried that but unfortunately it didn't change anything. I'm not sure what else could be the problem. All notebooks I made that use wav2vec have the same error now :(

patrickvonplaten commented 2 years ago

Hmmm, not really sure what could be the problem. In the new Transformers versions > 4.17 whenever the runtime is set to GPU, which can be checked with torch.cuda.is_available() then the Trainer should automatically put both the inputs and the model on GPU. Could you maybe put torch.cuda.is_available() statements before the bug and see what they give?

sofidipace commented 1 year ago

Hi Patrick. I have the same problem. I tried to update both libraries transformers and datasets to the latest version and tried to add the statement "if torch.cuda.is_available()". But I still receive the same error because CUDA is available. Are there other ways to solve the problem?

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Is there a way to put the "input" to cuda?

patrickvonplaten commented 1 year ago

Gently ping @sanchit-gandhi

sanchit-gandhi commented 1 year ago

Hey @sofidipace! Could you please share:

Your transformers + datasets version (run !transformers-cli env from a Colab cell)
A reproducible notebook / codesnippet (if possible!)

sofidipace commented 1 year ago

Hey @sanchit-gandhi of course! Versions that I installed (latest ones, but I got the same error with previous versions as well) Transformer version: 4.26.0 datasets version: 2.8.0 Hier the code: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/speech_recognition.ipynb#scrollTo=tborvC9hx88e

Hier !transformers-cli env

Hier the snippet:

import torch

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union

@dataclass
class DataCollatorCTCWithPadding:
    processor: Wav2Vec2Processor
    padding: Union[bool, str] = True
    max_length: Optional[int] = None
    max_length_labels: Optional[int] = None
    pad_to_multiple_of: Optional[int] = None
    pad_to_multiple_of_labels: Optional[int] = None

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lenghts and need
        # different padding methods
        input_features = [{"input_values": feature["input_values"]} for feature in features] #list

        label_features = [{"input_ids": feature["labels"]} for feature in features]

        batch = self.processor.pad(
            input_features,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors="pt",
        )
        with self.processor.as_target_processor():
            labels_batch = self.processor.pad(
                label_features,
                padding=self.padding,
                max_length=self.max_length_labels,
                pad_to_multiple_of=self.pad_to_multiple_of_labels,
                return_tensors="pt",
            )

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

        batch["labels"] = labels

        return batch

data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
wer_metric = load_metric("wer")

def compute_metrics(pred):
    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)

    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id

    pred_str = processor.batch_decode(pred_ids)
    # we do not want to group tokens when computing the metrics
    label_str = processor.batch_decode(pred.label_ids, group_tokens=False)

    wer = wer_metric.compute(predictions=pred_str, references=label_str)

    return {"wer": wer}

from transformers import AutoModelForCTC

model = AutoModelForCTC.from_pretrained(
    model_checkpoint, 
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id,
)

from transformers import TrainingArguments

training_args = TrainingArguments(
  output_dir=repo_name,
  group_by_length=True,
  per_device_train_batch_size=32,
  evaluation_strategy="steps",
  num_train_epochs=30,
  fp16=True,
  gradient_checkpointing=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=1e-4,
  weight_decay=0.005,
  warmup_steps=1000,
  save_total_limit=2,
  push_to_hub=True,
)

from transformers import Trainer

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=timit["train"],
    eval_dataset=timit["test"],
    tokenizer=processor.feature_extractor,
)

if torch.cuda.is_available():
  trainer.train() <------ ERROR

sofidipace commented 1 year ago

I just got it to run. I just commented out the versions !pip install datasets ~~==1.14~~ !pip install transformers ~~==4.11.3~~

and got me a huggingface write role token

sofidipace commented 1 year ago

And fyi you now have to download timit manually ;)

sanchit-gandhi commented 1 year ago

Hey @sofidipace - thanks for sharing your code! Confirming that you are able to run the notebook by commenting out the pinned transformers/datasets versions?

sofidipace commented 1 year ago

well, I got the problem with downloading timit. Thats why I switched to https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLS_R_on_Common_Voice.ipynb#scrollTo=9fRr9TG5pGBl

sanchit-gandhi commented 1 year ago

Cool! There are over 150 datasets on the Hub you can use for ASR: https://huggingface.co/datasets?task_categories=task_categories:automatic-speech-recognition&sort=downloads

You can just change the dataset id in the load_dataset function to whichever dataset you prefer 🚀

I would personally recommend Common Voice 11: it builds on the original common voice corpus with more data and speakers per language

You just need to agree to the terms of use on the Hub: https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0

And add use_auth_token=True to load_dataset:

common_voice_train = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="train+validation", use_auth_token=True)
common_voice_test = load_dataset("mozilla-foundation/common_voice_11_0", "tr", split="test", use_auth_token=True)

sofidipace commented 1 year ago

Thank you very much @sanchit-gandhi

osanseviero commented 1 year ago

FYI

And add use_auth_token=True to load_dataset:

This is not required anymore, this is retrieved automatically if you have logged in with huggingface-cli login

huggingface / blog

Fine-Tune Wav2Vec2 for English ASR on GCP: RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor #255