Closed thomaschhh closed 3 months ago
This might help: https://github.com/huggingface/distil-whisper/pull/76
Looks good for --attn_type "flash_attn"
but not for --attn_type "flash_attn_2"
. In that case I still get the above-mentioned error.
I see: I've only used "flash_attn"
and needed this PR.
Fixed in #101! You can now set --attn_implementation
to either {"eager", "sdpa", "flash_attn_2"}
: https://github.com/huggingface/distil-whisper/blob/b948d0269c6f071708c55de4a1e4030cd7726f14/training/run_pseudo_labelling.py#L136-L139
The README has been updated to reflect this change: https://github.com/huggingface/distil-whisper/tree/main/training#1-pseudo-labelling
Closing as resolved! Feel free to re-open if you continue to encounter issues
I just looked into it again and it seems like there is a mismatch between the help string
https://github.com/huggingface/distil-whisper/blob/b6400a3ff1b95e1125f9c2aecba25b97712f9465/training/run_distillation.py#L136 and the value that is expected in L142, which is flash_attention_2
and not flash_attn_2
.
I have been trying to replicate the training steps of distil-whisper as described in
training/README.md
:However, when running the pseudo-labeling step I run into the following error:
After that, I decided to set
--attn_type "flash_attn_2"
. However, this throws the following error:Is this a know error or have I been doing something wrong?