huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.09k stars 27.03k forks source link

Issue related to dtype with F.conv1d in Whisper evaluation #30673

Closed moncefbenaicha closed 4 months ago

moncefbenaicha commented 6 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

I use the classical whisper fine-tuning pipeline, similar to what @sanchit-gandhi published at https://huggingface.co/blog/fine-tune-whisper.

The problem arises when I use the argument predict_with_generate=True The script crashed in evaluation exactly in encoder.forward pass and return this exception :

return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same

The error never shows up in training steps, only the moment evaluation starts or if you call a trainer.evaluate()

I did some debugging to check dtype and used device before F.conv1d is called, and that's what I got:

weight.dtype torch.bfloat16

bias.dtype torch.bfloat16

input.device device(type='cuda', index=0)

weight.device device(type='cuda', index=0)

No exception

weight.dtype torch.bfloat16

bias.dtype torch.bfloat16

input.device device(type='cuda', index=0)

weight.device device(type='cuda', index=0)

bias.device device(type='cuda', index=0)

Raises a RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same

Expected behavior

Similar behavior between Training and Evaluation pass

LysandreJik commented 6 months ago

cc @sanchit-gandhi @ylacombe

moncefbenaicha commented 6 months ago

Update

A temporary solution is to force the input to bfloat16 and disable flash_attention.

batch = self.processor(
            audio=audio_arrays,
            sampling_rate=16000,
            padding="max_length",
            return_tensors="pt",
        )
batch["input_features"] = batch["input_features"].to(dtype=torch.bfloat16)
ylacombe commented 6 months ago

Hey @moncefbenaicha, you're temporary solution is actually the right one since the processor only outputs torch.float32 arrays!

However, I do believe it should work with Flash Attention, have you got an error using bloat16 and FA?

sanchit-gandhi commented 5 months ago

Hey @moncefbenaicha - it would be great to see:

  1. How you're instantiating the model using .from_pretrained. Specifically, what argument you're passing to attn_implementation, torch_dtype, and whether you're moving the model manually to a torch device
  2. The training args you're using. Specifically, what you set for fp16, bf16, fp16_full_eval and bf16_full_eval

Passing bf16_full_eval=True might be of interest to you if you're casting the model weights to bf16 manually yourself.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.