mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.43k stars 3.98k forks source link

Alphabet size mismatch with model output shape | alphabet.GetSize()+1) == (class_dim) #3801

Open huks0 opened 5 months ago

huks0 commented 5 months ago

Using the CTC Beam Search Decoder of DeepSpeech I get the following error:

[ctc_beam_search_decoder.cpp:279] FATAL: "(alphabet.GetSize()+1) == (class_dim)" check failed. Number of output classes in acoustic model does not match number of labels in the alphabet file. Alphabet file must be the same one that was used to train the acoustic model.

I have controlled the alphabet and it has the size of 1023, even though I built it with 1024 characters. The output shape of the model is 1025. I believe the mismatch should be 1 character. I thought of blank or unk token, but I aint sure if that is the cause of the error.

Did you ever encounter this?

For support and discussions, please use our Discourse forums.

If you've found a bug, or have a feature request, then please create an issue with the following information:

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Please describe the problem clearly. Be sure to convey here why it's a bug or a feature request.

Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case.