Token length limit - Githubissues

yeyupiaoling / Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

Apache License 2.0

867 stars 143 forks source link

Hello How are you? Thanks for contributing to this project. I am going to fine-tune Whisper model for Indian Telugu language on google/fleurs dataset.

torchrun --nproc_per_node=2 finetune.py --base_model=openai/whisper-large-v2 --language=None

But while training, I met the following issue.

File "/opt/conda/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 1757, in forward raise ValueError( ValueError: Labels' sequence length 495 cannot exceed the maximum allowed length of 448 tokens.

What do you think about possible reasons?

yeyupiaoling / Whisper-Finetune

Token length limit #96