srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 342 forks source link

Query on LibriSpeech Character Error Rate #204

Closed Pradeep-Rangan closed 5 years ago

Pradeep-Rangan commented 5 years ago

Hello Sir,

I have trained the end-to-end ASR on LibriSpeech corpus using the examples provided in the repository. However the RESULTS are provided only for the phoneme based lexicon units at the output CTC layer. I have tried to repeat the experiments for character based CTC output by modifying the scripts (as given in asr_egs/wsj/local/wsj_prepare_char_dict.sh). However, the CER obtained is ~11% , and is greater than the WER (~8.6%). Is there anything I am missing in obtaining the actual CER? It would be of great help if the RESULTS section also reports the actual CER.

I hope you do the needful.

Thanking You

With Regards Pradeep Rangan PhD Scholar Indian Institute of Technology Kharagpur

fmetze commented 5 years ago

The Character Error Rate (CER) is usually greater than the Word Error Rate (WER), and the character-based system will have higher error rates than the phone-based systems. I am not sure if someone has reference runs for these cases ready, and could provide them for addition to the README?

Pradeep-Rangan commented 5 years ago

Thanks for your information sir.