srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 342 forks source link

When can we enjoy the support for thchs30 corpus? #108

Closed colin1988 closed 7 years ago

colin1988 commented 7 years ago

In current release, the asr_egs has the support for hkust. However, acquiring the hkust is not very easy, while getting the thchs30 corpus is more convenient.

Is the support for thchs30 in the plan? what modification should I do to support thchs30?

Thanks. Looking forward to your reply

fmetze commented 7 years ago

We have no plans to release a recipe for THCHS30.

However, it should be straightforward to adapt the existing Kaldi recipe for it. Try comparing the Kaldi and Eesen HKUST recipes, or the Switchboard recipes - there are only a few differences in key places between the two toolkits.

We'll be happy to include a THCSH30 recipe with Eesen, if someone creates one.

riebling commented 7 years ago

This looks promising, given the availability of THCHS-30 in a format that works with Kaldi-based systems like EESEN: http://www.openslr.org/18/ as well as a Kaldi recipe, which can probably be modified for use with EESEN: https://github.com/wangdong99/kaldi

While we don't have immediate plans to support THCHS-30, adapting the Kaldi experiment and using the data above seems like a path to success!

On 11/20/2016 09:51 AM, colin1988 wrote:

In current release, the asr_egs has the support for hkust. However, acquiring the hkust is not very easy, while getting the thchs30 corpus is more convenient.

Is the support for thchs30 in the plan? what modification should I do to support thchs30?

Thanks. Looking forward to your reply

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/108, or mute the thread https://github.com/notifications/unsubscribe-auth/ACX11lqbqzwTx3C95aA0fjv54yklrK2_ks5rAF5-gaJpZM4K3jmA.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

colin1988 commented 7 years ago

@fmetze @riebling Thank you for your reply and good suggestions. I am trying to conquer it. Now the epoch 1 is running. Hopefully everything goes well!

Sundy1219 commented 7 years ago

I am trying to write a recipe for THCHS-30 with HKUST recipes in EESEN now . The challenge is how to prepare data. what are segment and reco2file_and_channel files used for ? maybe there can be removed. if do this, is there anything wrong in decoding ? @fmetze @riebling @colin1988 looking forward to your reply

colin1988 commented 7 years ago

@Sundy1219 In file steps/make_fbank.sh, different branch will continue depend on whether the file "segment" exists.

Assuming you are training model based on char, the script for preparing THCHS-30 data in Kaldi can be used to replace "local/hkust_data_prep.sh". THCHS-30 has provided lexicon file and language model , which are located in {THCHS-30}/lm_word. So you need remove the calling of "local/hkust_prepare_dict.sh" in run_ctc_char.sh, and remove the training of language model in local/hkust_train_lms.sh. Instead, copy the corresponding files to target directory and rename them.

thchs.diff.txt

I have attached a diff file, and hope it is useful to you.

Sundy1219 commented 7 years ago

“apply-cmvn --norm-vars=true --utt2spk=ark:data/dev/utt2spk scp:data/dev/cmvn.scp scp:exp/train_phn_l5_c320/cv.scp ark:- LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 893 utterances, errors on 0 LOG (copy-feats:main():copy-feats.cc:100) Copied 893 feature matrices. TRAINING STARTS [2016-Nov-23 18:03:02] [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 1 RUNNING ... ENDS [2016-Nov-23 19:22:22]: lrate 4e-05, TRAIN ACCURACY 1.1707%, VALID ACCURACY 0.0000% EPOCH 2 RUNNING ... ENDS [2016-Nov-23 20:44:48]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 3 RUNNING ... ENDS [2016-Nov-23 22:07:10]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 4 RUNNING ... ENDS [2016-Nov-23 23:29:05]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 5 RUNNING ... ENDS [2016-Nov-24 00:51:25]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 6 RUNNING ... ENDS [2016-Nov-24 02:13:33]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 7 RUNNING ... ENDS [2016-Nov-24 03:35:43]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 8 RUNNING ... ENDS [2016-Nov-24 04:57:44]: lrate 4e-05, TRAIN ACCURACY 0.0000%, VALID ACCURACY 0.0000% EPOCH 9 RUNNING ... ENDS [2016-Nov-24 06:19:37]: lrate 4e-05, TRAIN ACCURACY 0.0364%, VALID ACCURACY 0.8987% EPOCH 10 RUNNING ... ENDS [2016-Nov-24 07:42:05]: lrate 4e-05, TRAIN ACCURACY 0.6945%, VALID ACCURACY 0.8987% EPOCH 11 RUNNING ... ENDS [2016-Nov-24 09:03:43]: lrate 4e-05, TRAIN ACCURACY 0.8803%, VALID ACCURACY 0.8987% EPOCH 12 RUNNING ... ENDS [2016-Nov-24 10:25:03]: lrate 4e-05, TRAIN ACCURACY 0.9201%, VALID ACCURACY 0.8987% EPOCH 13 RUNNING ... ENDS [2016-Nov-24 11:46:16]: lrate 4e-05, TRAIN ACCURACY 0.9262%, VALID ACCURACY 0.8987% EPOCH 14 RUNNING ... ”

Thank you for your reply. I have run 13 epochs as above . is there anything wrong about results? TRAIN ACCURACY 0.9262% is too small . I think it should be 0.9262,not be 0.9262%. I don't modify anything about the file steps/train_ctc_parallel.sh . is it wrong about my recipe or about original code file steps/train_ctc_parallel.sh ? @colin1988 Looking forward to your reply

colin1988 commented 7 years ago

@Sundy1219 Do you build eesen with --use-cuda switch on or off ?

Depends: when we only want to do decoding and speech-to-text, and don't have GPU hardware, we leave off the cuda setting.

Sundy1219 commented 7 years ago

yes, of course @colin1988

colin1988 commented 7 years ago

@Sundy1219 Perhaps the train log (exp/train_****/log/tr.iter{n}.log) can give you some inspiration and you can focus on the statistics of gradient.

Sundy1219 commented 7 years ago

VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 230 sequences (0.384881Hr): Obj(log[Pzx]) = -290.591 TokenAcc = 3% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 240 sequences (0.402889Hr): Obj(log[Pzx]) = -237.216 TokenAcc = 0% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 250 sequences (0.421011Hr): Obj(log[Pzx]) = -289.111 TokenAcc = 1.46443% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 260 sequences (0.439178Hr): Obj(log[Pzx]) = -256.131 TokenAcc = 2.5685% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 270 sequences (0.457369Hr): Obj(log[Pzx]) = -265.045 TokenAcc = 2.4% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 280 sequences (0.475672Hr): Obj(log[Pzx]) = -255.11 TokenAcc = 0% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 290 sequences (0.494033Hr): Obj(log[Pzx]) = -269.895 TokenAcc = 0% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 300 sequences (0.512394Hr): Obj(log[Pzx]) = -273.641 TokenAcc = 3.48163% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 310 sequences (0.530758Hr): Obj(log[Pzx]) = -278.039 TokenAcc = 0.189751% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 320 sequences (0.549211Hr): Obj(log[Pzx]) = -290.843 TokenAcc = 0.555557% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 330 sequences (0.567739Hr): Obj(log[Pzx]) = -255.215 TokenAcc = 2.83688% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 340 sequences (0.586267Hr): Obj(log[Pzx]) = -297.401 TokenAcc = 2.50965% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 350 sequences (0.604864Hr): Obj(log[Pzx]) = -256.359 TokenAcc = 1.60142% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 360 sequences (0.623547Hr): Obj(log[Pzx]) = -294.577 TokenAcc = 1.73745% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 370 sequences (0.642242Hr): Obj(log[Pzx]) = -246.394 TokenAcc = 3.76344% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 380 sequences (0.660936Hr): Obj(log[Pzx]) = -289.206 TokenAcc = 2.14844% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 390 sequences (0.679661Hr): Obj(log[Pzx]) = -257.829 TokenAcc = 2.32975% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 400 sequences (0.698486Hr): Obj(log[Pzx]) = -275.381 TokenAcc = 3.35821% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 410 sequences (0.717347Hr): Obj(log[Pzx]) = -257.618 TokenAcc = 4.38932% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 420 sequences (0.736239Hr): Obj(log[Pzx]) = -280.503 TokenAcc = 4.85075% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 430 sequences (0.755231Hr): Obj(log[Pzx]) = -266.665 TokenAcc = 2.55474% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 440 sequences (0.774286Hr): Obj(log[Pzx]) = -252.469 TokenAcc = 2.27273% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 450 sequences (0.793342Hr): Obj(log[Pzx]) = -302.907 TokenAcc = 4.78927% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 460 sequences (0.812453Hr): Obj(log[Pzx]) = -244.66 TokenAcc = 6.9395% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 470 sequences (0.831647Hr): Obj(log[Pzx]) = -282.1 TokenAcc = 7.02811% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 480 sequences (0.850869Hr): Obj(log[Pzx]) = -253.725 TokenAcc = 2.8169% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 490 sequences (0.870092Hr): Obj(log[Pzx]) = -270.738 TokenAcc = 3.16206% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 500 sequences (0.889314Hr): Obj(log[Pzx]) = -276.496 TokenAcc = 4.04412% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 510 sequences (0.908553Hr): Obj(log[Pzx]) = -280.258 TokenAcc = 7.04225% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 520 sequences (0.927878Hr): Obj(log[Pzx]) = -287.953 TokenAcc = 6.73759% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 530 sequences (0.947264Hr): Obj(log[Pzx]) = -281.174 TokenAcc = 5.53691% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 540 sequences (0.966653Hr): Obj(log[Pzx]) = -263.671 TokenAcc = 4.03509% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 550 sequences (0.986042Hr): Obj(log[Pzx]) = -289.073 TokenAcc = 5.16605% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 560 sequences (1.00544Hr): Obj(log[Pzx]) = -284.301 TokenAcc = 4.39189% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 570 sequences (1.02492Hr): Obj(log[Pzx]) = -278.614 TokenAcc = 4.91228% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 580 sequences (1.04447Hr): Obj(log[Pzx]) = -283.056 TokenAcc = 3.46021% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 590 sequences (1.06402Hr): Obj(log[Pzx]) = -266.055 TokenAcc = 3.87205% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 600 sequences (1.08358Hr): Obj(log[Pzx]) = -281.553 TokenAcc = 2.35081% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 610 sequences (1.10314Hr): Obj(log[Pzx]) = -288.719 TokenAcc = 1.79211% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 620 sequences (1.12272Hr): Obj(log[Pzx]) = -272.143 TokenAcc = 4.1806% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 630 sequences (1.14236Hr): Obj(log[Pzx]) = -273.027 TokenAcc = 3.2197% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 640 sequences (1.1621Hr): Obj(log[Pzx]) = -270.419 TokenAcc = 3.83212% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 650 sequences (1.18185Hr): Obj(log[Pzx]) = -315.186 TokenAcc = 1.77305% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 660 sequences (1.2016Hr): Obj(log[Pzx]) = -264.406 TokenAcc = 1.51007% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 670 sequences (1.22135Hr): Obj(log[Pzx]) = -272.667 TokenAcc = 2.5641% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 680 sequences (1.2411Hr): Obj(log[Pzx]) = -278.625 TokenAcc = 4.56141% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 690 sequences (1.26093Hr): Obj(log[Pzx]) = -270.236 TokenAcc = 4.54546% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 700 sequences (1.28082Hr): Obj(log[Pzx]) = -286.018 TokenAcc = 3.33919% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 710 sequences (1.30073Hr): Obj(log[Pzx]) = -284.685 TokenAcc = 1.90972% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 720 sequences (1.32065Hr): Obj(log[Pzx]) = -281.513 TokenAcc = 1.55172% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 730 sequences (1.34057Hr): Obj(log[Pzx]) = -277.483 TokenAcc = 3.63322% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 740 sequences (1.36049Hr): Obj(log[Pzx]) = -276.258 TokenAcc = 3.41297% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 750 sequences (1.38047Hr): Obj(log[Pzx]) = -266.187 TokenAcc = 3.65854% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 760 sequences (1.40053Hr): Obj(log[Pzx]) = -279.188 TokenAcc = 3.57143% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 770 sequences (1.42061Hr): Obj(log[Pzx]) = -286.087 TokenAcc = 1.70068% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 780 sequences (1.44069Hr): Obj(log[Pzx]) = -251.748 TokenAcc = 1.66113% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 790 sequences (1.46077Hr): Obj(log[Pzx]) = -274.425 TokenAcc = 2.69231% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 800 sequences (1.48087Hr): Obj(log[Pzx]) = -284.711 TokenAcc = 2.4055% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 810 sequences (1.50102Hr): Obj(log[Pzx]) = -258.777 TokenAcc = 1.85185% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 820 sequences (1.52123Hr): Obj(log[Pzx]) = -283.693 TokenAcc = 2.02952% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 830 sequences (1.54147Hr): Obj(log[Pzx]) = -279.457 TokenAcc = 1.48026% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 840 sequences (1.56172Hr): Obj(log[Pzx]) = -266.759 TokenAcc = 2.22602% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 850 sequences (1.58197Hr): Obj(log[Pzx]) = -265.816 TokenAcc = 4.4484% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 860 sequences (1.60222Hr): Obj(log[Pzx]) = -280.719 TokenAcc = 2.15054% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 870 sequences (1.62247Hr): Obj(log[Pzx]) = -305.402 TokenAcc = 1.88356% VLOG[1] (train-ctc-parallel:EvalParallel():ctc-loss.cc:182) After 880 sequences (1.64272Hr): Obj(log[Pzx]) = -281.389 TokenAcc = 1.5674%

Hi , this is information in that log. I don't change any of training parameters in that script train_ctc_parallel.sh. it seems not to be convergence. I upload my dictionary file and lang file. could you give me some advice ? thank you looking forward to your reply dict.tar.gz lang.tar.gz

Sundy1219 commented 7 years ago

@colin1988 @fmetze @riebling @standy66 @naxingyu

colin1988 commented 7 years ago

@Sundy1219 LOG (train-ctc-parallel:main():train-ctc-parallel.cc:239) ### Gradient stats : Layer 1 : , wei_gifo_x_fwcorr ( min -24.3853, max 30.8273, mean 0.00492909, variance 0.374423, skewness 13.9542, kurtosis 1136.99 ) wei_gifo_m_fwcorr ( min -23.6504, max 24.5528, mean 0.00036014, variance 0.476923, skewness 0.578661, kurtosis 515.212 ) bias_fwcorr ( min -24.7345, max 9.28247, mean -0.0791711, variance 1.86607, skewness -12.1068, kurtosis 210.684 ) phole_i_c_fwcorr ( min -44.6839, max 20.0623, mean -0.388176, variance 13.1936, skewness -7.21848, kurtosis 82.1387 ) phole_f_c_fwcorr ( min -44.5129, max 29.1527, mean -0.893499, variance 50.4342, skewness -3.43761, kurtosis 19.3123 ) phole_o_c_fwcorr ( min -26.1664, max 42.7029, mean 0.0889722, variance 28.9105, skewness 1.13295, kurtosis 18.0783 ) wei_gifo_x_bwcorr ( min -3.44368, max 6.58303, mean 0.00155111, variance 0.0321193, skewness 7.87553, kurtosis 274.836 ) wei_gifo_m_bwcorr ( min -6.1077, max 5.46042, mean 1.9947e-05, variance 0.0330534, skewness -0.176696, kurtosis 137.344 ) bias_bwcorr ( min -4.80724, max 2.52975, mean -0.0184321, variance 0.112111, skewness -1.7063, kurtosis 46.3016 ) phole_i_c_bwcorr ( min -9.95112, max 3.74967, mean -0.0486192, variance 0.728587, skewness -6.63942, kurtosis 72.1495 )
phole_f_c_bwcorr ( min -13.5383, max 15.0034, mean -0.146901, variance 4.47642, skewness -1.34477, kurtosis 24.0015 ) phole_o_c_bwcorr ( min -9.18674, max 9.68164, mean -0.0719903, variance 1.92647, skewness -0.049077, kurtosis 16.723 ) Layer 2 : , wei_gifo_x_fwcorr ( min -10.1749, max 10.3298, mean 9.1813e-05, variance 0.0831392, skewness -0.154484, kurtosis 300.393 ) wei_gifo_m_fwcorr ( min -10.1741, max 10.153, mean -0.000123511, variance 0.0979502, skewness 1.05869, kurtosis 308.457 ) bias_fwcorr ( min -4.15387, max 10.1037, mean -0.00266422, variance 0.330429, skewness 7.60756, kurtosis 123.455 ) phole_i_c_fwcorr ( min -35.021, max 5.32412, mean -0.264882, variance 7.94007, skewness -10.0109, kurtosis 113.282 ) phole_f_c_fwcorr ( min -35.3808, max 50, mean -0.0975996, variance 38.3044, skewness 1.65148, kurtosis 30.7067 ) phole_o_c_fwcorr ( min -50, max 46.3394, mean -0.141416, variance 22.8823, skewness -1.15694, kurtosis 64.463 ) wei_gifo_x_bwcorr ( min -8.21069, max 7.96576, mean 0.000139954, variance 0.0628187, skewness 0.0630777, kurtosis 51.2825 ) wei_gifo_m_bwcorr ( min -6.99753, max 7.44292, mean -0.000479073, variance 0.0556428, skewness -0.246594, kurtosis 64.4153 ) bias_bwcorr ( min -4.38619, max 4.06402, mean -0.00323093, variance 0.206514, skewness -0.288128, kurtosis 21.1459 )

this is the statistics info of gradient.

Sundy1219 commented 7 years ago

thank you for your reply .I have got a good decoding result. after 15 epochs ,the tokenAcc goes up to around 92% .The WER in decoding stage is around 20% @colin1988

riebling commented 7 years ago

That's pretty good!

Sundy1219 commented 7 years ago

I appreciate you for your open project. That is a pretty good project. I think the speed of training and decoding still could be improved. Do you have a plan to use warp-ctc of BaiDu for training and decoding ? @riebling @fmetze

riebling commented 7 years ago

Great idea, not in the plans, but anyone is welcome to contribute such an implementation.

fmetze commented 7 years ago

If you look at the test results, for a small-ish number of tokens, warp-ctc is not faster then Eesen.

We did look at it ourselves and did not find it a significant speed-up. If you want to speed-up training, increase the --num-sequence and --frame-num-limit parameters. We will release updated recipes with better settings soon.

Sundy1219 commented 7 years ago

I looked at your paper "EESEN:end-to-end speech recognition using deep rnn models and wfst-based decoding" for many times. I still don't understand why posterior normalization is needed during decoding . Question 1 can you explain it in detail ? Question 2 isn't it softmax probability produced when Wav features are send to the trained network ?, why is "dir/label count" needed? Question 3 Is the input parameter softmax probability for latgen-faster Looking forward to your reply @fmetze @riebling

riebling commented 7 years ago

These questions probably belong in a new topic, to which we look forward to replying. Closing this issue because THCHS30 corpus was verified to work:

thank you for your reply .I have got a good decoding result. after 15 epochs ,the tokenAcc goes up to around 92% .The WER in decoding stage is around 20% @colin1988

It might be nice to contribute the resulting working code as asr_egs/thchs30 - this way EESEN could have a nice, open source Chinese example!