phineas-pta / fine-tune-whisper-vi

jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
Apache License 2.0
6 stars 2 forks source link

ValueError on w2v-bert-v2[train] #1

Open allandclive opened 8 months ago

allandclive commented 8 months ago

On google colab; ValueError: Label values must be <= vocab_size: 29

allandclive commented 8 months ago

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in <cell line: 1>() ----> 1 TRAINER.train() # resume_from_checkpoint=True # only if resume

9 frames /usr/local/lib/python3.10/dist-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py in forward(self, input_features, attention_mask, output_attentions, output_hidden_states, return_dict, labels) 1245 if labels is not None: 1246 if labels.max() >= self.config.vocab_size: -> 1247 raise ValueError(f"Label values must be <= vocab_size: {self.config.vocab_size}") 1248 1249 # retrieve loss input_lengths from attention_mask

ValueError: Label values must be <= vocab_size: 29`

phineas-pta commented 8 months ago

seem like u tried to fine tune for uganda language

in that case i think u should follow the official guide: https://huggingface.co/blog/fine-tune-w2v2-bert

my use case (vietnamese language) is a bit more mainstream so my script is simplified a lot comparing to official guide