File "/usr/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/usr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1702, in forward
raise ValueError(f"Label values must be <= vocab_size: {self.config.vocab_size}")
ValueError: Label values must be <= vocab_size: 56
Spaces are the problems, id 56 corresponds to whitespace. Here's an example sentence tokenized:
Hellos, first of all thanks for the lovely code!
I'm trying to fine tune XLSR-53 with some French data, code is just from the examples directory:
However I get a training error:
Spaces are the problems, id
56
corresponds to whitespace. Here's an example sentence tokenized:Afais from the code, special tokens and spaces are added by the Token set code. What am I doing wrong? :blush: