Open zhangjiulong opened 8 years ago
Hi,
my guess is that you will need to reduce the number of parameters in the model - l=5 and c=320 are good settings for Switchboard and TEDLIUM, with hundreds of hours of training data, but not for TIMIT, with just a few. The difference in the ark files shows this (somewhat). The TIMIT speakers are much shorter than the TEDLIUM speakers, and therefore the sum and sum-of-squares of the data in the speaker is much smaller (which is what I think you’re showing). Finally, during decoding, you can see that the network likes to output “outmoded” and “journalese” for some reason. Presumably you are still using the TEDLIUM language model?
Do you know someone who is familiar with the Kaldi TIMIT recipe? I think you need to adapt the Eesen recipe a bit more for it to give good results, the Kaldi TIMIT recipe would probably be a good starting point to see what is being done.
Florian
On Jun 6, 2016, at 5:36 AM, zhangjiulong notifications@github.com wrote:
Hi I tested timit data using eesen, but the result is not good as follows:
training process
EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147% EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223% EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033% EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179% EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625% EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400% EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177% finished, too small rel. improvement .0777 Training succeeded. The final model exp/train_phn_l5_c320/final.nnet Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001 cv.ark train.ark testing process
rjb1_sx64-0000000-0000248 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames. mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames. mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames. mrjh0_si1145-0000000-0000487 how unauthentic LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository. LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames. mrjh0_si1775-0000000-0000306 how unauthentic LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames. mrjh0_si515-0000000-0000296 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames. mrjh0_sx155-0000000-0000394 how unauthentic I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come. The tedlium's ark file like this :
AlGore_2009 [ 510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020 6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]
And the timit's is like this:
fadg0_sa1 [ 3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372 28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ] but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:
91c91
< || exit 208;
|| exit 1;
106c106
< || exit 209;
|| exit 1;
and the scripts is like this:
!/bin/bash
Copyright 2012 Karel Vesely Johns Hopkins University (Author: Daniel Povey)
Apache 2.0
To be run from .. (one directory up from here)
see ../run.sh for example
Begin configuration section.
nj=4 cmd=run.pl fbank_config=conf/fbank.conf compress=true
End configuration section.
echo "$0 $@" # Print the command line for logging
if [ -f path.sh ]; then . ./path.sh; fi . parse_options.sh || exit 1;
if [ $# != 3 ]; then echo "usage: make_fbank.sh [options]
"; echo "options: " echo " --fbank-config # config passed to compute-fbank-feats " echo " --nj # number of parallel jobs" echo " --cmd (utils/run.pl|utils/queue.pl ) # how to run jobs." exit 1; fi data=$1 logdir=$2 fbankdir=$3
make $fbankdir an absolute pathname.
fbankdir=
perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}
use "name" as part of name of the archive.
name=
basename $data
mkdir -p $fbankdir || exit 1; mkdir -p $logdir || exit 1;
if [ -f $data/feats.scp ]; then mkdir -p $data/.backup echo "$0: moving $data/feats.scp to $data/.backup" mv $data/feats.scp $data/.backup fi
scp=$data/wav.scp
required="$scp $fbank_config"
for f in $required; do if [ ! -f $f ]; then echo "make_fbank.sh: no such file $f" exit 1; fi done
utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;
if [ -f $data/spk2warp ]; then echo "$0 [info]: using VTLN warp factors from $data/spk2warp" vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk" elif [ -f $data/utt2warp ]; then echo "$0 [info]: using VTLN warp factors from $data/utt2warp" vtln_opts="--vtln-map=ark:$data/utt2warp" fi
for n in $(seq $nj); do
the next command does nothing unless $fbankdir/storage/ exists, see
utils/create_data_link.pl for more info.
utils/create_data_link.pl $fbankdir/rawfbank$name.$n.ark
doneif [ -f $data/segments ]; then echo "$0 [info]: segments file exists: using that." split_segments="" for n in $(seq $nj); do split_segments="$split_segments $logdir/segments.$n" done
utils/split_scp.pl $data/segments $split_segments || exit 1; rm $logdir/.error 2>/dev/null
$cmd JOB=1:$nj $logdir/makefbank${name}.JOB.log \ extract-segments scp,p:$scp $logdir/segments.JOB ark:- | \ compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- | \ copy-feats --compress=$compress ark:- \ ark,scp:$fbankdir/rawfbank$name.JOB.ark,$fbankdir/rawfbank$name.JOB.scp \ || exit 208;
else echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance." split_scps="" for n in $(seq $nj); do split_scps="$split_scps $logdir/wav.$n.scp" done
utils/split_scp.pl $scp $split_scps || exit 1;
$cmd JOB=1:$nj $logdir/makefbank${name}.JOB.log \ compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- | \ copy-feats --compress=$compress ark:- \ ark,scp:$fbankdir/rawfbank$name.JOB.ark,$fbankdir/rawfbank$name.JOB.scp \ || exit 209;
fi
if [ -f $logdir/.error.$name ]; then echo "Error producing fbank features for $name:" tail $logdir/makefbank${name}.1.log exit 1; fi
concatenate the .scp files together.
for n in $(seq $nj); do cat $fbankdir/rawfbank$name.$n.scp || exit 1; done > $data/feats.scp
rm $logdir/wav..scp $logdir/segments. 2>/dev/null
nf=
cat $data/feats.scp | wc -l
nu=cat $data/utt2spk | wc -l
if [ $nf -ne $nu ]; then echo "It seems not all of the feature files were successfully ($nf != $nu);" echo "consider using utils/fix_data_dir.sh $data" fiecho "Succeeded creating filterbank features for $name" Is there some thing wrong? and what is out-moded and journalese mean?
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames. mbns0_sx430-0000000-0000343 out-moded LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames. mbns0_sx70-0000000-0000119 journalese — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/59, or mute the thread https://github.com/notifications/unsubscribe/AEnA8QBSiH3oS5B64Y-BZpqPWKpzoH38ks5qI-oDgaJpZM4IuuED.
Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University
Hi @fmetze , thanks for your suggestion, I will try it. But the language model I used is built from timit text. And the result is very strange
Word-based language model built on TIMIT is relatively weak. I recommend you to compose a phone language model. You plug a fake dictionary which simply contains the duplicates of phones: A A B B ....
@yajiemiao do you means test the phones eesen recognized, not the word ?
yep My very first verification of EESEN was done on TIMIT. I was able to get reasonable (if not state-of-the-art) phone error rates
@yajiemiao ok thanks very much.
@fmetze Hi, fmetze With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.
We have not run such experiments. I think there is some work on how to build uni-directional LSTMs that work for speech (mainly stacking future frames rather than relying on the RNN to learn them), or decompose the sentence BiLSTM into a series of shorter BiLSTMs that one can evaluate quickly,but we have not implemented any of this in Eesen. Would be a great feature, though ;-)
On Aug 29, 2016, at 10:07 PM, baylor0118 notifications@github.com wrote:
@fmetze https://github.com/fmetze Hi, fmetze With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/59#issuecomment-243312459, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8fTDVtVbNLEhzQYGU8Q5IpeY3ZDsks5qk5BKgaJpZM4IuuED.
Florian Metze http://www.cs.cmu.edu/directory/florian-metze Associate Research Professor Carnegie Mellon University
In general, CTC highly depends on BiLSTM for reasonable performance. If you refer to http://www.cs.cmu.edu/~ymiao/pub/icassp2016_ctc.pdf, on Switchboard, Uni-directional models perform >15% worse than Bi-directional models, with the same number of model parameters.
@yajiemiao @zhangjiulong can you please share the example tested with TIMIT dataset?
I just convert timit format to stm format and runns using tedlium scripts.
@Aasimrafique, were you able to convert the TIMIT format to STM format as instructed by @zhangjiulong? If so could you please share how you did it exactly. @yajiemiao , @fmetze, it would be very helpful if you could share TIMIT dataset test.
Thanks.
it would be very helpful if you could share TIMIT dataset test.
Unfortunately, as mentioned in Wikipedia:
TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset.
We are not permitted to distribute TIMIT data.
@riebling I forgot to add in scripts in the end, I do have access to the TIMIT dataset and what I meant to ask was if the TIMIT dataset test scripts could be shared.
Oops, my misunderstanding. My best guess is that at least here at CMU, there is no TIMIT Eesen experiment to share. The only person that seems to have tried this (aside from Yajie, who is no longer with us) is @zhangjiulong
Florian suggests people try adapting Kaldi TIMIT experiment. This does not imply he has done so
We have not run such experiments.
or therefore has any scripts to share.
@riebling Okay, I see. But I did create a new issue here https://github.com/srvk/eesen/issues/128, describing what I've done and the issues I am facing. Could you please suggest how I could move forward?
Hi I tested timit data using eesen, but the result is not good as follows:
training process
testing process
I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come. The tedlium's ark file like this :
And the timit's is like this:
but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:
and the scripts is like this:
Is there some thing wrong? and what is out-moded and journalese mean?