Closed moncefbenaicha closed 4 months ago
cc @sanchit-gandhi @ylacombe
A temporary solution is to force the input to bfloat16 and disable flash_attention.
batch = self.processor(
audio=audio_arrays,
sampling_rate=16000,
padding="max_length",
return_tensors="pt",
)
batch["input_features"] = batch["input_features"].to(dtype=torch.bfloat16)
Hey @moncefbenaicha, you're temporary solution is actually the right one since the processor only outputs torch.float32
arrays!
However, I do believe it should work with Flash Attention, have you got an error using bloat16 and FA?
Hey @moncefbenaicha - it would be great to see:
.from_pretrained
. Specifically, what argument you're passing to attn_implementation
, torch_dtype
, and whether you're moving the model manually to a torch devicefp16
, bf16
, fp16_full_eval
and bf16_full_eval
Passing bf16_full_eval=True
might be of interest to you if you're casting the model weights to bf16
manually yourself.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.39.3Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I use the classical whisper fine-tuning pipeline, similar to what @sanchit-gandhi published at https://huggingface.co/blog/fine-tune-whisper.
The problem arises when I use the argument
predict_with_generate=True
The script crashed in evaluation exactly in encoder.forward pass and return this exception :The error never shows up in training steps, only the moment evaluation starts or if you call a trainer.evaluate()
I did some debugging to check dtype and used device before F.conv1d is called, and that's what I got:
No exception
Raises a RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
Expected behavior
Similar behavior between Training and Evaluation pass