igormq / speech2text

MIT License

11 stars 2 forks source link

readme

Speech2Text

Implementation of "An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora" by Igor Quintanilha, Luiz Wagner Pereira Biscainho, and Sergio Lima Netto. (submitted).

Requirements

pytorch >= 1.0.1
cudatoolkit >= 9.0
torchvision
torchaudio
ignite
pyyaml
wget
num2words
unidecode
editdistance
ctcdecode

Datasets

All datasets can be found here.

Acoustic models

AM	Trained on	Method	WER	Download
DeepSpeech 2	BRSD v2	Scratch	52.55% (2.42%)	Link
DeepSpeech 2	BRSD v2	Fine-tuned	47.41% (1.73%)	Link

Language models

Language model*	RP	Size	LapsBM	BRTD
word 3-gram	25	1.9G	173.79	161.29
word 5-gram	42	7.8G	136.50	135.12
char 5-gram	5	41M	<=2,334.48	<=2,694.51
char 10-gram	10	4.7G	<=271.86$	<=323.71
char 15-gram*	15	5.4G	<=239.59$	<=198.49
char 20-gram*	20	8.8G	<=227.84$	<=189.53

*All models were trained using KenLM. More detailed information in the paper.