hirofumi0810/tensorflow_end2end_speech_recognition - Githubissues

hirofumi0810 / tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

MIT License

313 stars 120 forks source link

asr attention-mechanism automatic-speech-recognition beam-search csj ctc end-to-end end-to-end-learning joint-ctc-attention librispeech speech-recognition speech-to-text tensorflow timit timit-dataset

readme

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

TensorFlow >= 1.3.0
tqdm >= 4.14.0
python-Levenshtein >= 0.12.0
setproctitle >= 1.1.10
seaborn >= 0.7.1

Corpus

TIMIT

Phone (39, 48, 61 phones)
character

LibriSpeech

Phone (under implementation)
Character
Word

CSJ (Corpus of Spontaneous Japanese)

Phone (under implementation)
Japanese kana character (about 150 classes)
Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

Switchboard
WSJ
AMI

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Encoder

BLSTM
LSTM
BGRU
GRU
VGG-BLSTM
VGG-LSTM
Multi-task BLSTM
- you can set another CTC layer to the aubitrary layer.
Multi-task LSTM
VGG

Connectionist Temporal Classification (CTC) [Graves+ 2006]

Greedy decoder
Beam Search decoder
Beam Search decoder w/ CharLM (under implementation)

Options

Frame-stacking [Sak+ 2015]
Multi-GPUs training (synchronous)
Splicing
Down sampling (under implementation)

Attention Mechanism

Decoder

Greedy decoder
Beam search decoder (under implementation)

Attention type

Bahdanau's content-based attention
Bahdanau's normed content-based attention (under implementation)
location-based attention
Hybrid attention
Luong's dot attention
Luong's scaled dot attention (under implementation)
Luong's general attention
Luong's concat attention
Baidu's attention (under implementation)

Options

Sharpning
Temperature regularization in the softmax layer (Output posteriors)
Joint CTC-Attention [Kim 2016]
Coverage (under implementation)

Usage

Please refer to docs in each corpuse

TIMIT
LibriSpeech
CSJ

Lisense

MIT

Contact

hiro.mhbc@gmail.com