Training Error when run tedlium recipe

yuexianghubit commented 5 years ago

Hi Alls:

I already install essen and I am trying to run the tedlium recipe, but I got the error:

train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1 WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 1080 Ti free:189M, used:10985M, total:11175M, free/total:0.0169513 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 1 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [1]: GeForce GTX 1080 Ti free:10983M, used:195M, total:11178M, free/total:0.982556 version 6.1 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.

ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>". ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".

[stack trace: ] eesen::KaldiGetStackTrace[abi:cxx11]() eesen::KaldiErrorMessage::~KaldiErrorMessage() eesen::ExpectToken(std::istream&, bool, char const*) eesen::BiLstm::ReadData(std::istream&, bool) eesen::Layer::Read(std::istream&, bool, bool) . . . eesen::Net::Read(std::istream&, bool) eesen::Net::Read(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) train-ctc-parallel(main+0xbb1) [0x434345] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1fec83830] train-ctc-parallel(_start+0x29) [0x4320f9]

How to fix it? Thank you!

fmetze commented 5 years ago

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the librispeech recipe (or https://github.com/jb1999/eesen), while the actual code is srvk's Eesen?

Eesen's standard acoustic model does not contain the ForwardDropFactor, but jb1999's Eesen does.

yuexianghubit commented 5 years ago

I installed srvk's Eesen, not jb1999's Eesen. Today i tried the librispeech recipe, the acoutsic model training process can run normally, the nnet proto is below: `

` ` 360 640 0.1 1.0 50.0 1.0 0.2 T 0.2 T T T` ` 640 640 0.1 1.0 50.0 1.0 0.2 T 0.2 T T T` ` 640 640 0.1 1.0 50.0 1.0 0.2 T 0.2 T T T` ` 640 640 0.1 1.0 50.0 1.0 0.2 T 0.2 T T T` ` 640 44 0.1` ` 44 44` `

`

but when i ran the tedlium recipe , the acoustic model training got the error i sent before. And the nnet proto now is: `

` ` 120 640 0.1 1.0 50.0 1.0` ` 640 640 0.1 1.0 50.0 1.0` ` 640 640 0.1 1.0 50.0 1.0` ` 640 640 0.1 1.0 50.0 1.0` ` 640 640 0.1 1.0 50.0 1.0` ` 640 78 0.1` ` 78 78` `

`

srvk / eesen

Training Error when run tedlium recipe #213