As Chinese students studying in the states, we found our speaking habits morphed -- English words and phrases easily get slipped into Chinese sentences. We greatly feel the need to have messaging apps that can handle multilingual speech-to-text translation. So in this task, we are going to develop this function -- build a model using deep learning architecture(DNN, CNN, LSTM) to corretly translate multilingual audio (having Chinese and English in the same sentence) into text.
Contains scripts to build our system
LDC2015S04, our dataset description
Our study notes on Kaldi related recipie, including timit
and librispeech
filename: | pattern: | format: | path: | source: | |
---|---|---|---|---|---|
acoustic data: | spk2gender | \ |
/data/train /data/test | handmade | |
utt2spk | \ |
/data/train /data/test | handmade | ||
wav.scp | \ |
.scp: kaldi script file | /data/train /data/test | handmade | |
text | \ |
.ark: kaldi archive file | /data/train /data/test | exists | |
language data: | lexicon.txt | \ |
.ark: kaldi archive file | data/local/dict | egs/voxforge |
nonsilence_phones.txt | \ |
data/local/dict | unkown | ||
silence_phones.txt | \ |
data/local/dict | unkown | ||
optional_silence.txt | \ |
data/local/dict | unkown | ||
Tools: | utils | / | kaldi/egs/wsj/s5 | ||
steps | / | kaldi/egs/wsj/s5 | |||
score.sh | / | kaldi/egs/voxforge/s5/local |
What are our language model:
3-grams trained from the transcripts of THCHS30 + LDC2015S04
directory structure taken from /egs/TIMIT/s5:
/data
/local
/nist_lm
/lm_phone_bg.arpa.gz
How to build a language model:
Kaldi script utils/prepare_lang.sh
usage: utils/prepare_lang.sh <dict-src-dir> <oov-dict-entry> <tmp-dir> <lang-dir>
e.g.: utils/prepare_lang.sh data/local/dict <SPOKEN_NOISE> data/local/lang data/lang
options:
--num-sil-states <number of states> # default: 5, #states in silence models.
--num-nonsil-states <number of states> # default: 3, #states in non-silence models.
--position-dependent-phones (true|false) # default: true; if true, use _B, _E, _S & _I
# markers on phones to indicate word-internal positions.
--share-silence-phones (true|false) # default: false; if true, share pdfs of
# all non-silence phones.
--sil-prob <probability of silence> # default: 0.5 [must have 0 < silprob < 1]
Turning the –share-silence-phones option to TRUE was extremely helpful for the Cantonese data of IARPA's BABEL project, where the data is very messy and has long untranscribed portions that the Kaldi developers try to align to a special phone that is designated for that purpose. The --sil-prob might be another potentially important option.
echo
echo "===== FEATURES EXTRACTION ====="
echo
# Making feats.scp files
mfccdir=mfcc
# Uncomment and modify arguments in scripts below if you have any problems with data sorting
# utils/validate_data_dir.sh data/train # script for checking prepared data - here: for data/train directory
# utils/fix_data_dir.sh data/train # tool for data proper sorting if needed - here: for data/train directory
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir
steps/make_mfcc.sh --nj $nj --cmd "$train_cmd" data/test exp/make_mfcc/test $mfccdir
# Making cmvn.scp files
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test $mfccdir
MFCC-related documents
as the transition probability from state i to state j
as the emission probability from state j to sequence X
Forward-backward algorithm fine tunes
GMM provides
HMM solves the following three problems:
In order to train CNN, we need to extract MFSC features from the acoustic data instead of MFCC features, as Discrete Cosine Transformation (DCT) in MFCC destroys locality. MFSC features also called filter banks. In Kaldi, the scripts are something like the following:
steps/make_fbank.sh --nj 3 \ $trainDir/train_clean_fbank exp/make_fbank/train_clean_fbank feat/fbank/ || exit 1;
steps/compute_cmvn_stats.sh $trainDir/train_clean_fbank exp/make_fbank/train_clean_fbank feat/fbank/ || exit 1;
notice that fbanks don't work well with GMM as fbanks features are highly correlated, and GMM modelled with diagonal covariance matrices assumed independence of feature streams. fbanks/MFSC is okay with DNN, best for CNN.
why MFSC+GMM produced high WER-see Kaldi discussion
why DCT destroys locality-see post
tensorflow == 1.1.0
theano == 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
keras == 1.2
This doesn't require Sun GridEngine. Simply download [CUDA toolkit] (https://developer.nvidia.com/cuda-downloads), install it with
sudo sh cuda_8.0.61_375.26_linux.run
and then go under kaldi/src execute
./configure
to check if it detects CUDA, you will also find CUDA = true
in kaldi/src/kaldi.mk
then recompile Kaldi with
make -j 8 # 8 for 8-core cpu
make depend -j 8 # 8 for 8-core cpu
Noted that GMM-based training and decode is not supported by GPU, only nnet
does. source
**
if you are using AWS g2.2xlarge, and launched the instance before 2017-04-18 (when this note is written), its NVIDIA may need a legacy 367.x driver, the default (latest) driver that comes with CUDA-8 cuda_8.0.61_375.26_linux.run
will fail.
To check the current version of the driver installed on the instance, type
apt-cache search nvidia | grep -P '^nvidia-[0-9]+\s'
to install a version of your choice from the list, type
sudo apt-get install nvidia-367
You can also download a specifc version from the web, for example NVIDIA-Linux-x86_64-367.18.run
. Install it with
sudo sh NVIDIA-Linux-x86_64-367.18.run
and then when installing cuda_8.0.61_375.26_linux.run
, it will ask you whether to install NVIDIA driver 375, make sure you choose no
.
Required:
# makes sure you are out of the tensorflow git repo
python
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
A working tensorflow will output:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:04.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0
1. During testing, if you run into error like:
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /usr/local/cuda/lib64 I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
from the writer's experience, you didn't set the right `LD_LIBRARY_PATH` in the `~/.profile` file. You need to examine where is `libcudnn.so.5` located and move it to the desired location, most likely it will be `/usr/local/cuda`. Also make sure you type `source ~/.profile` to activate the change, after you modify the file.
2. If you are testing it in a python shell, and you met the following error:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
very likely you are in the actual `tensorflow` git repo. [source](https://github.com/tensorflow/tensorflow/issues/8107), make sure you jump out of it before testing.
### Install <a name="tn-gpu"></a> Theano GPU
Keras-kaldi's LSTM training script breaks under the current tensorflow (as tensorflow went through series of API changes during the previous months), we need to install Theano GPU and switch to the theano backend for running `run_kt_LSTM.sh`.
After installing Theano-gpu using [miniconda](http://deeplearning.net/software/theano/install_ubuntu.html),
in order to modify the `theano.config` file, you can create `.theanorc` by the following command:
echo -e "\n[global]\nfloatX=float32\n" >> ~/.theanorc
and add `device=gpu` to the this file.
If theano can't detect NVCC, by giving you the following error:
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
(but you sure that you installed CUDA), you can solve it by adding the following lines to `~/.profile`:
export PATH=/usr/local/cuda-8.0/bin/:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
don't forget to `source ~/.profile` to enable the change.
to change the keras backend from tensorflow to theano, modify:
vim $HOME/.keras/keras.json
to test if theano is indeed using gpu, execute the following file:
from theano import function, config, shared, tensor import numpy import time vlen = 10 30 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], tensor.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i in range(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if numpy.any([isinstance(x.op, tensor.Elemwise) and ('Gpu' not in type(x.op).name) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu')
### Kaldi script to train nnet
1. 3-4 hours to train, 3 hours to decode on GPU:
[local/online/run_nnet2_baseline.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/online/run_nnet2.sh)
### Chinese CER (Character Error Rate)
1. [egs/hkust/s5/local/ext/score.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/hkust/s5/local/ext/score.sh)
### <a name="keras-kaldi"></a> Keras-Kaldi
[dspavankumar/keras-kaldi github repo](https://github.com/dspavankumar/keras-kaldi)
Up to the time that we ran his code, the enviornment is still Keras 1.2.0
Make sure that the Keras version is the same across the machines.
to reinstall Keras from 2.0.3 to older version, type
$ sudo pip3 install keras==1.2 or $ conda install keras==1.2.2 # if you are using conda
If there is version inconsistency (train model using 1.2.0 but decode it with 2.0.3, you will run into problem when loading an existing model:
File "steps_kt/nnet-forward.py", line 33, in
[source](https://github.com/fchollet/keras/issues/4044)