hirofumi0810 / neural_sp

End-to-end ASR/LM implementation with PyTorch
Apache License 2.0
589 stars 140 forks source link
asr attention attention-mechanism automatic-speech-recognition ctc language-model language-modeling pytorch rnn-transducer seq2seq sequence-to-sequence speech speech-recognition streaming transformer transformer-xl

Build Status codecov

NeuralSP: Neural network based Speech Processing

How to install

cd tools
make KALDI=/path/to/kaldi TOOL=/path/to/save/tools

Key features

Corpus

Front-end

Encoder

Connectionist Temporal Classification (CTC) decoder

RNN-Transducer (RNN-T) decoder [link]

Attention-based decoder

Language model (LM)

Output units

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

ASR Performance

AISHELL-1 (CER)

Model dev test
Conformer LAS 4.1 4.5
Transformer 5.0 5.4
Streaming MMA 5.5 6.1

AISHELL-2 (CER)

Model test_android test_ios test_mic
Conformer LAS 6.1 5.5 5.9

CSJ (WER)

Model eval1 eval2 eval3
Conformer LAS 5.7 4.4 4.9
BLSTM LAS 6.5 5.1 5.6
LC-BLSTM MoChA 7.4 5.6 6.4

Switchboard 300h (WER)

Model SWB CH
BLSTM LAS 9.1 18.8

Switchboard+Fisher 2000h (WER)

Model SWB CH
BLSTM LAS 7.8 13.8

LaboroTVSpeech (CER)

Model dev_4k dev tedx-jp-10k
Conformer LAS 7.8 10.1 12.4

Librispeech (WER)

Model dev-clean dev-other test-clean test-other
Conformer LAS 1.9 4.6 2.1 4.9
Transformer 2.1 5.3 2.4 5.7
BLSTM LAS 2.5 7.2 2.6 7.5
BLSTM RNN-T 2.9 8.5 3.2 9.0
UniLSTM RNN-T 3.7 11.7 4.0 11.6
UniLSTM MoChA 4.1 11.0 4.2 11.2
LC-BLSTM RNN-T 3.3 9.8 3.5 10.2
LC-BLSTM MoChA 3.3 8.8 3.5 9.1
Streaming MMA 2.5 6.9 2.7 7.1

TEDLIUM2 (WER)

Model dev test
Conformer LAS 7.0 6.8
BLSTM LAS 8.1 7.5
LC-BLSTM RNN-T 8.0 7.7
LC-BLSTM MoChA 10.3 8.6
UniLSTM RNN-T 10.7 10.7
UniLSTM MoChA 13.5 11.6

WSJ (WER)

Model test_dev93 test_eval92
BLSTM LAS 8.8 6.2

LM Performance

Penn Tree Bank (PPL)

Model valid test
RNNLM 87.99 86.06
+ cache=100 79.58 79.12
+ cache=500 77.36 76.94

WikiText2 (PPL)

Model valid test
RNNLM 104.53 98.73
+ cache=100 90.86 85.87
+ cache=2000 76.10 72.77

Reference

Dependency