kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.25k stars 5.32k forks source link

RNNLM train error when not use sparse_feature #2208

Closed yanglin187 closed 6 years ago

yanglin187 commented 6 years ago

Hi, When training a RNNLM with PTB corpus, I didn't use sparse_feature ( comments on the line choose_features.py in run_tdnn.sh). But I got errors as follows:

# rnnlm-train --rnnlm.max-param-change=0.5 --rnnlm.l2_regularize_factor=1 --embedding.max-param-change=0.5 --embedding.learning-rate=0.0001 --embedding.l2_regularize=0.005 --use-gpu=yes "--read-rnnlm=nnet3-copy --learning-rate=0.001 exp/rnnlm_tdnn_a/0.raw -|" --write-rnnlm=exp/rnnlm_tdnn_a/1.raw --read-embedding=exp/rnnlm_tdnn_a/word_embedding.0.mat --write-embedding=exp/rnnlm_tdnn_a/word_embedding.1.mat "ark,bg:cat exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt | rnnlm-get-egs --srand=0 --vocab-size=10003 --num-samples=512 --sample-group-size=2 --num-threads=10 exp/rnnlm_tdnn_a/sampling.lm - ark:- |" # Started at Fri Jan 26 17:13:59 CST 2018 # `rnnlm-train --rnnlm.max-param-change=0.5 --rnnlm.l2_regularize_factor=1 --embedding.max-param-change=0.5 --embedding.learning-rate=0.0001 --embedding.l2_regularize=0.005 --use-gpu=yes '--read-rnnlm=nnet3-copy --learning-rate=0.001 exp/rnnlm_tdnn_a/0.raw -|' --write-rnnlm=exp/rnnlm_tdnn_a/1.raw --read-embedding=exp/rnnlm_tdnn_a/word_embedding.0.mat --write-embedding=exp/rnnlm_tdnn_a/word_embedding.1.mat 'ark,bg:cat exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt exp/rnnlm_tdnn_a/text/1.txt | rnnlm-get-egs --srand=0 --vocab-size=10003 --num-samples=512 --sample-group-size=2 --num-threads=10 exp/rnnlm_tdnn_a/sampling.lm - ark:- |' LOG (rnnlm-train[5.3]:SelectGpuId():cu-device.cc:178) CUDA setup operating under Compute Exclusive Mode. LOG (rnnlm-train[5.3]:FinalizeActiveGpu():cu-device.cc:234) The active GPU is [0]: Tesla P100-PCIE-16GB free:15917M, used:359M, total:16276M, free/total:0.977939 version 6.0 nnet3-copy --learning-rate=0.001 exp/rnnlm_tdnn_a/0.raw - LOG (nnet3-copy[5.3]:main():nnet3-copy.cc:114) Copied raw neural net from exp/rnnlm_tdnn_a/0.raw to - rnnlm-get-egs --srand=0 --vocab-size=10003 --num-samples=512 --sample-group-size=2 --num-threads=10 exp/rnnlm_tdnn_a/sampling.lm - ark:- LOG (rnnlm-train[5.3]:UpdateNnetWithMaxChange():nnet-utils.cc:1812) Per-component max-change active on 1 / 4 Updatable Components.(smallest factor=0.00109136 on output.affine with max-change=1.5). Global max-change factor was 0.333333 with max-change=0.5. ASSERTION_FAILED (rnnlm-train[5.3]:AddMat():cu-matrix.cc:947) : 'A.NumRows() == numrows && A.NumCols() == numcols'

[ Stack-Trace: ] rnnlm-train() [0x993c64] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::KaldiAssertFailure_(char const, char const, int, char const) kaldi::CuMatrixBase::AddMat(float, kaldi::CuMatrixBase const&, kaldi::MatrixTransposeType) kaldi::rnnlm::RnnlmEmbeddingTrainer::Train(kaldi::CuArrayBase const&, kaldi::CuMatrixBase) kaldi::rnnlm::RnnlmTrainer::TrainWordEmbedding(kaldi::CuMatrixBase) kaldi::rnnlm::RnnlmTrainer::TrainInternal() kaldi::rnnlm::RnnlmTrainer::Train(kaldi::rnnlm::RnnlmExample*) main __libc_start_main rnnlm-train() [0x6c53f9]`

Maybe there is something wrong at: image

The size of embeddingmat is vocab_size600, but the size of embedding_deriv is active_words.Dim() 600. Of course when using spars_feature everything is ok, and error can be avoided by setting sampling=fasle. How could I train a rnnlm without using sparse_feature? Thanks a lot!

danpovey commented 6 years ago

Resolved in #2210