gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 84 forks source link

How much GPU RAM is needed for training? #38

Closed svenha closed 6 years ago

svenha commented 6 years ago

I retrained a German model, but it runs out of GPU RAM in: nnet3-chain-train --use-gpu=wait --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=exp/nnet3_chain/tdnn_250/cache.1 --write-cache=exp/nnet3_chain/tdnn_250/cache.2 --xent-regularize=0.1 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=1.0 --srand=1 "nnet3-am-copy --raw=true --learning-rate=0.000998730064726 --scale=0.980025398705 exp/nnet3_chain/tdnn_250/1.mdl - |" exp/nnet3_chain/tdnn_250/den.fst "ark,bg:nnet3-chain-copy-egs --frame-shift=2 ark:exp/nnet3_chain/tdnn_250/egs/cegs.2.ark ark:- | nnet3-chain-shuffle-egs --buffer-size=5000 --srand=1 ark:- ark:- | nnet3-chain-merge-egs --minibatch-size=512 ark:- ark:- |" exp/nnet3_chain/tdnn_250/2.1.raw

I have 4 GB GPU RAM, which is only used by kaldi in exclusive mode. This was enough back in June. What are your experiences or recommendations?

The error message contains this: ERROR (nnet3-chain-train[5.5.95~1-4bdb]:AllocateNewRegion():cu-allocator.cc:513) Failed to allocate a memory region of 356515840 bytes. Possibly smaller minibatch size would help. Memory info: free:190M, used:3844M, total:4035M, free/total:0.0473123

So, should I retry with --minibatch-size=384 (instead of 512)? This value makes this step complete, but I probably have to rerun all steps to be sure.

svenha commented 6 years ago

I had to decrease num-chunk-per-minibatch 288 to num-chunk-per-minibatch 96 in run-chain.sh (144 was too big), too. With these two changes, run-chain.sh finished after 10 days (tdnn_f alone took 6 of them.) But it's worth it: %WER 11.67 [ 20564 / 176256, 3497 ins, 2716 del, 14351 sub] tdnn_sp/decode_test/wer_9_0.5 %WER 13.27 [ 23387 / 176256, 3544 ins, 3313 del, 16530 sub ] tdnn_250/decode_test/wer_8_1.0 %WER 8.60 [ 15166 / 176256, 2617 ins, 2104 del, 10445 sub ] tdnn_f/decode_test/wer_10_1.0

gooofy commented 6 years ago

I am using a nvidia GTX 1080 Ti GPU for my model builds:

02:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)