Now I'm trying to use a language model as described in the Readme.md. I'm trying to use the same LM in the language_model folder in the HuggingFace model card above, but it prints some warning in console:
09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?
09/02/2022 12:10:19 - WARNING - pyctcdecode.alphabet - Unigrams and labels don't seem to agree.
WER accuracy also dropped a lot. Am I doing something wrong? What language model is compatible to the above Portuguese model?
First, thank you very much for this great project, it makes ASR very easy!
And your models are awesome! I made some accuracy tests with https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-portuguese model (https://github.com/sepinf-inc/IPED/issues/1214#issuecomment-1207470644) and it is comparable to Microsoft's and Google's pt-BR models, actually a bit better!
Now I'm trying to use a language model as described in the Readme.md. I'm trying to use the same LM in the language_model folder in the HuggingFace model card above, but it prints some warning in console:
WER accuracy also dropped a lot. Am I doing something wrong? What language model is compatible to the above Portuguese model?
Thanks in advance