igormq / speech2text

MIT License
11 stars 2 forks source link

Speech2Text

Implementation of "An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora" by Igor Quintanilha, Luiz Wagner Pereira Biscainho, and Sergio Lima Netto. (submitted).

Requirements

Datasets

All datasets can be found here.

Acoustic models

AM Trained on Method WER Download
DeepSpeech 2 BRSD v2 Scratch 52.55% (2.42%) Link
DeepSpeech 2 BRSD v2 Fine-tuned 47.41% (1.73%) Link

Language models

Language model* RP Size LapsBM BRTD
word 3-gram 25 1.9G 173.79 161.29
word 5-gram 42 7.8G 136.50 135.12
char 5-gram 5 41M <=2,334.48 <=2,694.51
char 10-gram 10 4.7G <=271.86$ <=323.71
char 15-gram* 15 5.4G <=239.59$ <=198.49
char 20-gram* 20 8.8G <=227.84$ <=189.53

*All models were trained using KenLM. More detailed information in the paper.