Closed danpovey closed 4 years ago
I will help do, the results and tuning scripts will be made into PR under kaldi_52
@GaofengCheng, I just realized that I made a really stupid mistake when I implemented the xconfig layers for convolution. I forgot to include any nonlinearity! There is no ReLU. I'll be modifying the xconfig code and the associated example. It should be done before you get to the CNN stage of these experiments.
@danpovey I have not yet begun the CNN exps. After your modification, I'll take this and help check
OK, I have fixed the CNN layer, adding ReLU's. I'm running the (now-modified) WSJ example and it does lead to better objective functions.
On Sun, Apr 23, 2017 at 8:33 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey I have not yet begun the CNN exps. After your modification, I'll take this and help check
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296500501, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu7ychzT7oV08qP_0iAC7rwk9SzWdks5ry-3pgaJpZM4NFP1w .
@danpovey get it, I'll begin CNN today
BTW, something to keep an eye on is, whether the 'test-mode' batch-norm is behaving the same as the training-mode in terms of log-prob. (we use test mode in the compute_prob logs and in the combination and in actual testing). In the current egs/wsj/s5/local/chain/tuning/run_cnn_tdnn_1a.sh, after training a while, the training and valid log-probs become not that good, and when I tried nnet3-chain-compute-prob with --test-mode=false they got better (almost a factor of 2). So test-mode of the BatchNorm component is not working as we expect it to.
In that configuration, though, I had converted all the layers to batch-norm, even the non-convolutional layers; and in those cases the batch-norm has to estimate many more individual scaling factors. So it's plausible that it's somehow giving information about the current minibatch. I also suspect some kind of bug in the batch-norm code, but I'm looking and I can't see one.
I figured out the problem with the batch-norm. It's that during a single iteration the model was moving too far, so that the batch-norm stats accumulated over the entire iteration became non-representative. Part of the issue was that there was a bug in the reading and writing code, whereby the learning-rate and max-change was not being read or written (and also in the xconfig I had forgotten to set the max-change to 0.75, so I had not detected that issue). If the problem persists when the learning-rate and max-change code is fixed, I'll change how the stats for batch-norm are collected.
@danpovey
How should I change cepstral-lifter if I use 64dim feature
idct-layer name=idct input=input dim=40 cepstral-lifter=22 affine-transform-file=$dir/configs/idct.mat
leave it the same
On Mon, Apr 24, 2017 at 10:40 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey How should I change cepstral-lifter if I use 64dim feature idct-layer name=idct input=input dim=40 cepstral-lifter=22 affine-transform-file=$dir/configs/idct.mat
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296875699, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu4DoH2azd9MM_GJfXCLqsNpVhCfxks5rzV0fgaJpZM4NFP1w .
@danpovey I did not find learning-rate-factor
related to
And remove the learning-rate-factor=2.0 on the ivector branch, I realized it's inappropriate, and that branch is training too fast
in final.config when running cnn_tdnn_lstm or cnn_tdnn, just learning-rate-factor=5.0
on the xent_output
It's OK, I may already have done that.
On Mon, Apr 24, 2017 at 11:33 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey I did not find learning-rate-factor related to And remove the learning-rate-factor=2.0 on the ivector branch, I realized it's inappropriate, and that branch is training too fast in final.config when running cnn_tdnn_lstm or cnn_tdnn, just learning-rate-factor=5.0 on the xent_output
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296887869, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuyALqXnVXond1bLd82U-rDvetQM5ks5rzWmegaJpZM4NFP1w .
You may notice that in in egs/wsj/s5/local/chain/tuning/run_cnn_tdnn_1b.sh, in the first convolutional layer I added:
learning-rate-factor=0.333 max-change=0.25
to make it train slower, because I noticed the first convolutional layer was training too fast. You might want to do that, or at least look out for how fast it's training (grep for Relative in progress.*.log to see).
yes, I'm running a comparison exp under AMI IHM to figure out whether this will help.
@danpovey training setting --trainer.deriv-truncate-margin 8 \
will cause the cnn_tdnn_lstm to fail (I change directly from egs/ami/s5b/tuning/run_tdnn_lstm_1i.sh). I remove this option, will this affect the results a lot?
That would be a bug in the code. I need to look into that. Can you please show me the error message, and if you think it would help, get me a stack trace?
On Tue, Apr 25, 2017 at 8:27 AM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey training setting --trainer.deriv-truncate-margin 8 \ will cause the cnn_tdnn_lstm to fail (I change directly from egs/ami/s5b/tuning/run_tdnn_lstm_1i.sh). I remove this option, will this affect the results a lot?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297013969, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu2WFiG5Exw_-VRrSDOBtyBAmEUjgks5rzeaigaJpZM4NFP1w .
@danpovey
nnet3-am-copy --raw=true --learning-rate=0.002 exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl -
LOG (nnet3-am-copy[5.1]:main():nnet3-am-copy.cc:140) Copied neural net from exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl to raw format as -
nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:-
nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:-
nnet3-chain-copy-egs --left-context=57 --right-context=27 --frame-shift=2 ark:/nobackup/datapool/project/proa/chenggaofeng/ami/s5b_ASRU2017/exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/egs/cegs.2.ark ark:-
ASSERTION_FAILED (nnet3-chain-train[5.1]:FindNumLeadingAndTrailingIdenticals():nnet-optimize-utils.cc:2209) : 'ptr != end && "Vector consists entirely of -1's."'
[ Stack-Trace: ]
nnet3-chain-train() [0x11829a6]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
nnet3-chain-train() [0xe069c8]
nnet3-chain-train() [0xe06b0b]
kaldi::nnet3::SnipRowOps(kaldi::nnet3::NnetComputation*)
kaldi::nnet3::Optimize(kaldi::nnet3::NnetOptimizeOptions const&, kaldi::nnet3::Nnet const&, int, kaldi::nnet3::NnetComputation*)
kaldi::nnet3::CachingOptimizingCompiler::CompileNoShortcut(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileViaShortcut(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::Compile(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&)
main
__libc_start_main
nnet3-chain-train() [0xcdad99]
OK, I have just pushed a code change that should resolve that-- please see if it's better.
On Tue, Apr 25, 2017 at 8:53 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey
nnet3-am-copy --raw=true --learning-rate=0.002 exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl - LOG (nnet3-am-copy[5.1]:main():nnet3-am-copy.cc:140) Copied neural net from exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl to raw format as - nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:- nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:- nnet3-chain-copy-egs --left-context=57 --right-context=27 --frame-shift=2 ark:/nobackup/datapool/project/proa/chenggaofeng/ami/s5b_ASRU2017/exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/egs/cegs.2.ark ark:- ASSERTION_FAILED (nnet3-chain-train[5.1]:FindNumLeadingAndTrailingIdenticals():nnet-optimize-utils.cc:2209) : 'ptr != end && "Vector consists entirely of -1's."'
[ Stack-Trace: ] nnet3-chain-train() [0x11829a6] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::KaldiAssertFailure_(char const, char const, int, char const) nnet3-chain-train() [0xe069c8] nnet3-chain-train() [0xe06b0b] kaldi::nnet3::SnipRowOps(kaldi::nnet3::NnetComputation) kaldi::nnet3::Optimize(kaldi::nnet3::NnetOptimizeOptions const&, kaldi::nnet3::Nnet const&, int, kaldi::nnet3::NnetComputation) kaldi::nnet3::CachingOptimizingCompiler::CompileNoShortcut(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileViaShortcut(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::Compile(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&) main __libc_start_main nnet3-chain-train() [0xcdad99]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297206674, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3Sl6-D209DEMDvXZan6ejEZ9e3wks5rzpWBgaJpZM4NFP1w .
there is a mistake in previous comments, delete
What was the mistake? I notice that your 'final' probs were much worse than the other probs, indicating that something went wrong in the combination stage. BTW, re-running the computeprob(train or valid).sh with --test-mode=false will indicate if the problem is unsuitable batch-norm stats. If the batch-norm stats are unsuitable, it will affect decoding.
Dan
On Wed, Apr 26, 2017 at 3:45 AM, Gaofeng Cheng notifications@github.com wrote:
there is a mistake in previous comments, delete
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297272941, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuwuwJAxElp7qY1DzoOVIriJ2f65Aks5rzvYIgaJpZM4NFP1w .
@danpovey OK, after several CNN experiments, I think there may be problems in batchnorm and combination.
I do experiments:
batchcnn+batchtdnn VS batchcnn(smallmax-change)+batchtdnn VS cnn+batchtdnn VS cnn(smallmax-change)+batchtdnn
also
batchcnn+batchtdnn+lstm VS batchcnn(smallmax-change)+batchtdnn+lstm VS cnn+batchtdnn+lstm VS cnn(smallmax-change)+batchtdnn+lstm
take cnn+tdnn+lstm for example (AMI IHM):
Anyway, I will follow your idea, try test-mode false
It's more complicated than just doing test-mode=false, there needs to be a command that refreshes the stats on the final.mdl before decoding.
Like one final iteration of training, but with zero learning rate, done after combination. That would do it. I'll try to make a PR.
On Wed, Apr 26, 2017 at 9:53 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey OK, after several CNN experiments, I think there may be problems in batchnorm and combination. I do experiments: batchcnn+batchtdnn VS batchcnn(smallmax-change)+batchtdnn VS cnn+batchtdnn VS cnn(smallmax-change)+batchtdnn also batchcnn+batchtdnn+lstm VS batchcnn(smallmax-change)+batchtdnn+lstm VS cnn+batchtdnn+lstm VS cnn(smallmax-change)+batchtdnn+lstm take cnn+tdnn+lstm for example (AMI IHM): [image: image] https://cloud.githubusercontent.com/assets/7532101/25464491/40de8342-2b2f-11e7-8771-a67185d61fe9.png
Anyway, I will follow your idea, try test-mode false
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297589345, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuy3X2OxX_Mhi7iHEGWBsYGYNPNopks5rz_UqgaJpZM4NFP1w .
After your PR, will test-mode=false
still be needed?
test-mode=false will mostly just affect the diagnostics, I'm not sure whether I'll recommend to pass in that option; anyway we can change the default at the C++ level if needed.
On Wed, Apr 26, 2017 at 10:00 PM, Gaofeng Cheng notifications@github.com wrote:
After your PR, will test-mode=false still be needed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590203, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-Uk0_JQ8s2UxBDGazqUZQafJ7i1ks5rz_axgaJpZM4NFP1w .
@danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem
I don't really understand what you are asking.
On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .
Gaofeng, within half an hour I will submit a PR. But I'm only doing the change for the nnet3, non-chain side, I will want you to replicate my changes in the 'chain' binaries before testing it. You can make a PR with your changes.
On Wed, Apr 26, 2017 at 10:11 PM, Daniel Povey dpovey@gmail.com wrote:
I don't really understand what you are asking.
On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .
So what I do in my PR is, in the nnet3-combine code we use test-mode=false to get the combination weights; and when we have them, we get the batchnorm stats again, on the combination data. I believe all the lines of code that I am changing have 'chain' versions that need to be changed also.
On Wed, Apr 26, 2017 at 10:30 PM, Daniel Povey dpovey@gmail.com wrote:
Gaofeng, within half an hour I will submit a PR. But I'm only doing the change for the nnet3, non-chain side, I will want you to replicate my changes in the 'chain' binaries before testing it. You can make a PR with your changes.
On Wed, Apr 26, 2017 at 10:11 PM, Daniel Povey dpovey@gmail.com wrote:
I don't really understand what you are asking.
On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng <notifications@github.com
wrote:
@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .
@danpovey my doubt is why I just remove batch in cnn, the WER will change so much.....
So maybe what you're saying that if you just remove batch-norm from CNN, the WER gets bad. I think CNNs tend to get very unstable without batchnorm, maybe you'd need much smaller learning rates.
On Wed, Apr 26, 2017 at 10:53 PM, Gaofeng Cheng notifications@github.com wrote:
@danpovey https://github.com/danpovey my doubt is why I just remove batch in cnn, the WER will change so much.....
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297597086, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuwjxy8u0TXz6vqBUsM4RkkrK23D6ks5r0AMdgaJpZM4NFP1w .
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.
@GaofengCheng, I was hoping you could help with this (this is in the kaldi_52 branch). In wsj/s5/local/chain/tuning/run_cnn_tdnn_1a.sh I have an example of using CNNs, but it's not working great. Something that I think is worth trying is as follows:
It will be necessary to try a version of this experiment without the iVectors (you can re-use the egs by including the input node but just not using it, i.e. replacing
relu-renorm-layer input=Append(-1,0,1,ReplaceIndex(ivector-2, t, 0))...
withrelu-renorm-layer input=Append(-1,0,1)
. The reason is that with this new way of processing ivectors, we will want to verify that they are still helping.If this works, we may want to have a differently named config for the feature type, e.g. hires -> xhires, and have a different version of the run_ivector_common.sh (or different options to it). But for now, since we don't know if it will work, you can just edit the mfcc_hires.conf and rerun the feature extraction and ivector-extractor training.