kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.95k stars 5.29k forks source link

batched-threaded-nnet3-cuda-online-pipeline use --gpu-feature-extract=true will have a assertion failed: (k == k1) when the feature dimension after LDA transform is not equal to the origin feature dim #4345

Open yuandcc opened 3 years ago

yuandcc commented 3 years ago

In kaldi/src/cudafeat/feature-online-batched-ivector-cuda.cc, line 145:

// line 55 - 59, resize vars depend on config
feats_stash_.Resize(num_channels_ * (left + right), feat_dim_, kUndefined);
norm_feats_stash_.Resize(num_channels_ * (left + right), feat_dim_, kUndefined);
spliced_feats_.Resize(num_lanes * chunk_size, feat_dim_ * size, kUndefined);
tmp_feats_.Resize(num_lanes * chunk_size, feat_dim_, kUndefined);

// line 145, read config
feat_dim_ = ie_M[0].NumRows();

// line 176 - 187, extractor ivector
// cmvn feats and store in tmp_feats_
cmvn_->ComputeFeaturesBatched(num_lanes, lanes, feats, &tmp_feats_);
// splice normalized feats
SpliceFeats(tmp_feats_, norm_feats_stash_, &spliced_feats_, lanes, num_lanes);
// Stash feats
StashFeats(tmp_feats_, &norm_feats_stash_, lanes, num_lanes);
// LDA transform spliced feats back into tmp_feats
LDATransform(spliced_feats_, &tmp_feats_, lanes, num_lanes);

When I train ivector extractor(my mfcc dim is 80, after splice is 560),the pipline I used is: 1) Train LDA using steps/train_lda_mllt.sh, and the LDA matrix dim is 560 *40 (dim = 40 is default config steps/train_ldamllt.sh) 2) Train UBM, and the mean dim of GMM is 40 due to transform-feats using LDA. 3) Train ivector extractor, and the M[0].NumRows() is 40.

When I decode use batched-threaded-nnet3-cuda-online-pipeline and set --gpu-feature-extract=true, in feature-online-batched-ivector-cuda.cc : 1) featdim = ie_M[0].NumRows(); So featdim is 40 rather than 80. (line 145) 2) Some member variable (such as tmpfeats) be resized depend on featdim , which 40 rather than 80 (line 55 -59) 3) The dim of features after cmvn is from 80 to 40. After splice, the dim is 280 (40 7) rather than 560 (80 7). Then do LDATransform will have a Assertion failed: (k == k1) because of mismatch between 280 (spliced feature dim) and 560 (LDA mat rows).

I think if we allow the dim of origin feature and after LAD transform not to be equal, it might need to keep two set different variables for origin feat and lda feat in feature-online-batched-ivector-cuda.cc, like featdim and lda_featdim, tmpfeats and tmp_ldafeats, then resize different variables use different dim.

dgxlsir commented 3 years ago

in my scrips,when i write "batched-wav-nnet3-cuda2 $srcdir/${iter}.mdl $graphdir/HCLG.fst "$wav_rspecifier" \ "$lat_wspecifier" || exit 1;"there would print log: batched-wav-nnet3-cuda2 exp/chain/final.mdl exp/chain/tdnn_1a_sp/graph/HCLG.fst 'ark,s,cs:wav-copy scp,p:data/test_chain_hires/split1/1/wav.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn_1a_sp//lat.1.gz' LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuId():cu-device.cc:223) CUDA setup operating under Compute Exclusive Mode. LOG (batched-wav-nnet3-cuda2[5.5]:FinalizeActiveGpu():cu-device.cc:308) The active GPU is [0]: GeForce RTX 2080 Ti free:10668M, used:351M, total:11019M, free/total:0.968125 version 7.5 LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes. LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components. LOG (batched-wav-nnet3-cuda2[5.5]:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2 lattice-scale --acoustic-scale=10.0 ark:- ark:- LOG (batched-wav-nnet3-cuda2[5.5]:ExplainWhyAllOutputsNotComputable():nnet-computation-graph.cc:351) 100 output cindexes out of 100 were not computable. LOG (batched-wav-nnet3-cuda2[5.5]:ExplainWhyAllOutputsNotComputable():nnet-computation-graph.cc:355) Computation request was: # Computation request: input-0: name=input, has-deriv=false, indexes=[(0,-12:61), (1,-12:61)] output-0: name=output, has-deriv=false, indexes=[(0,0:49), (1,0:49)] need-model-derivative: false store-component-stats: false

LOG (batched-wav-nnet3-cuda2[5.5]:ExplainWhyAllOutputsNotComputable():nnet-computation-graph.cc:357) Printing the reasons for 10 of these. LOG (batched-wav-nnet3-cuda2[5.5]:ExplainWhyNotComputable():nnet-computation-graph.cc:172) cindex output(0, 0, 0) is not computable for the following reason: output(0, 0, 0) is kNotComputable, dependencies: output.affine(0, 0, 0)[kNotComputable], output.affine(0, 0, 0) is kNotComputable, dependencies: output.affine_input(0, 0, 0)[kNotComputable], output.affine_input(0, 0, 0) is kNotComputable, dependencies: prefinal-chain.batchnorm(0, 0, 0)[kNotComputable], prefinal-chain.batchnorm(0, 0, 0) is kNotComputable, dependencies: prefinal-chain.batchnorm_input(0, 0, 0)[kNotComputable], ...

dgxlsir commented 3 years ago

WARNING (wav-copy[5.5.561~1457-666b8]:Open():util/kaldi-table-inl.h:106) Failed to open script file data/test_chain_hires/split1/1/wav.scp ERROR (wav-copy[5.5.561~1457-666b8]:SequentialTableReader():util/kaldi-table-inl.h:860) Error constructing TableReader: rspecifier is scp,p:data/test_chain_hires/split1/1/wav.scp

danpovey commented 3 years ago

That command line looks a bit short, I think the usage mirrors that of online2-wav-nnet3-latgen-faster, see example scripts that invoke prepare_online_decoding.sh and then steps/online/nnet3/decode.sh , look in their decode.xx.log files for the usage. At least there should be a config file referred to, that would have details regarding an ivector extractor.

dgxlsir commented 3 years ago

thanx for your reply! it's a very good idea!

dgxlsir commented 3 years ago

Thanks for your great suggestion,and my problem gone. But there are new problem that when I run my script, the audio tail doesn't recognized. I tried to adjust '--max-count',but got no results!  Best regards!

---Original--- From: "Daniel Povey"<notifications@github.com> Date: Mon, Nov 30, 2020 19:11 PM To: "kaldi-asr/kaldi"<kaldi@noreply.github.com>; Cc: "Comment"<comment@noreply.github.com>;"刘春平"<774408885@qq.com>; Subject: Re: [kaldi-asr/kaldi] batched-threaded-nnet3-cuda-online-pipeline use --gpu-feature-extract=true will have a assertion failed: (k == k1) when the feature dimension after LDA transform is not equal to the origin feature dim (#4345)

That command line looks a bit short, I think the usage mirrors that of online2-wav-nnet3-latgen-faster, see example scripts that invoke prepare_online_decoding.sh and then steps/online/nnet3/decode.sh , look in their decode.xx.log files for the usage. At least there should be a config file referred to, that would have details regarding an ivector extractor.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.