srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 343 forks source link

install failure on ubuntu 16.04 #137

Closed jasonTuZq closed 7 years ago

jasonTuZq commented 7 years ago

hi, when i tried to install ATLAS by install_srilm.sh, it prompted the following: ERROR: enum fam=3, chip=2, model=62, mach=0 make[3]: [atlas_run] Error 44 make[2]: [IRunArchInfo_x86] Error 2 CPU Throttling apparently enabled! It appears you have cpu throttling enabled, which makes timings unreliable and an ATLAS install nonsensical. Aborting. See ATLAS/INSTALL.txt for further information xconfig exited with 1

i have tried several ways to turn off cpu-throttling and not working. i wonder if it's ok to install ATLAS via sudo apt-get install libatlas-dev ?

fmetze commented 7 years ago

yes, it is

jasonTuZq commented 7 years ago

then why i can't run the tedlium/v1/run_ctc_phn.sh sucessfully? the data prep and feat ext steps seem fine but the training failed without finishing the epoch 1 and just exiting, it prompted the following: LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 2859 utterances, errors on 0 LOG (copy-feats:main():copy-feats.cc:100) Copied 2859 feature matrices. TRAINING STARTS [2017-Jun-20 21:58:27] [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 1 RUNNING ... Removing features tmpdir exp/train_phn_l5_c320/yZwIs @ 311Ubuntu cv.ark train.ark

fmetze commented 7 years ago

there should be another log file in the exp/train_phn_l5_c320/log folder that tells you what is going on in iteration 1. what does it say?

jasonTuZq commented 7 years ago

it basically says "Memory allocation failure", still not knowing how to get it to work, here are part of the log: train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 1 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX TITAN Black free:194M, used:5882M, total:6076M, free/total:0.0319767 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 0 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [0]: GeForce GTX TITAN Black free:177M, used:5899M, total:6076M, free/total:0.0291791 version 3.5 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory. copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- add-deltas ark:- ark:- LOG (train-ctc-parallel:main():train-ctc-parallel.cc:121) TRAINING STARTED WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:658) Allocation of 400 rows, each of size 2560 bytes failed, releasing cached memory and retrying. WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:665) Allocation failed for the second time. Printing device memory usage and exiting LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 185073664 bytes. ERROR (train-ctc-parallel:MallocInternal():cuda-device.cc:668) Memory allocation failure WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe copy-feats scp:exp/train_phn_l5_c320/train_local.scp ark:- | add-deltas ark:- ark:- | had nonzero return status 36096 ERROR (train-ctc-parallel:MallocInternal():cuda-device.cc:668) Memory allocation failure

fmetze commented 7 years ago

try training a smaller model. less layers and less nodes per layer. also, try reducing the --frame-limit to 10000 or so. this will reduce the memory requirement of the training

jasonTuZq commented 7 years ago

i reduced the parameters you suggested and it worked. Thanks for your help. ^_^