Closed jasonTuZq closed 7 years ago
yes, it is
then why i can't run the tedlium/v1/run_ctc_phn.sh sucessfully? the data prep and feat ext steps seem fine but the training failed without finishing the epoch 1 and just exiting, it prompted the following: LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 2859 utterances, errors on 0 LOG (copy-feats:main():copy-feats.cc:100) Copied 2859 feature matrices. TRAINING STARTS [2017-Jun-20 21:58:27] [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 1 RUNNING ... Removing features tmpdir exp/train_phn_l5_c320/yZwIs @ 311Ubuntu cv.ark train.ark
there should be another log file in the exp/train_phn_l5_c320/log folder that tells you what is going on in iteration 1. what does it say?
it basically says "Memory allocation failure", still not knowing how to get it to work, here are part of the log: train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 1 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX TITAN Black free:194M, used:5882M, total:6076M, free/total:0.0319767 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 0 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [0]: GeForce GTX TITAN Black free:177M, used:5899M, total:6076M, free/total:0.0291791 version 3.5 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory. copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- add-deltas ark:- ark:- LOG (train-ctc-parallel:main():train-ctc-parallel.cc:121) TRAINING STARTED WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:658) Allocation of 400 rows, each of size 2560 bytes failed, releasing cached memory and retrying. WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:665) Allocation failed for the second time. Printing device memory usage and exiting LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 185073664 bytes. ERROR (train-ctc-parallel:MallocInternal():cuda-device.cc:668) Memory allocation failure WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- | add-deltas ark:- ark:- | had nonzero return status 36096 ERROR (train-ctc-parallel:MallocInternal():cuda-device.cc:668) Memory allocation failure
try training a smaller model. less layers and less nodes per layer. also, try reducing the --frame-limit to 10000 or so. this will reduce the memory requirement of the training
i reduced the parameters you suggested and it worked. Thanks for your help. ^_^
hi, when i tried to install ATLAS by install_srilm.sh, it prompted the following: ERROR: enum fam=3, chip=2, model=62, mach=0 make[3]: [atlas_run] Error 44 make[2]: [IRunArchInfo_x86] Error 2 CPU Throttling apparently enabled! It appears you have cpu throttling enabled, which makes timings unreliable and an ATLAS install nonsensical. Aborting. See ATLAS/INSTALL.txt for further information xconfig exited with 1
i have tried several ways to turn off cpu-throttling and not working. i wonder if it's ok to install ATLAS via sudo apt-get install libatlas-dev ?