CNN experiments - Githubissues

danpovey commented 7 years ago

@GaofengCheng, I was hoping you could help with this (this is in the kaldi_52 branch). In wsj/s5/local/chain/tuning/run_cnn_tdnn_1a.sh I have an example of using CNNs, but it's not working great. Something that I think is worth trying is as follows:

Start from our best setup, say a TDNN+LSTM setup with dropout and batch-norm.
Try changing the feature type (first do this without using CNNs, just with the same config as before). I recommend using 64 mel bins and ceps instead of 40 (so change --num-mel-bins and --num-ceps both to 64). And make sure it has --use-energy=false, otherwise it's not suitable for CNNs because it's not just a linear transformation of the filterbanks. It looks like our hires configs already have --use-energy=false
Next, try adding just 2 convolutional layers at the start of those networks, with small dimensions so that it's not too slow. And remove the learning-rate-factor=2.0 on the ivector branch, I realized it's inappropriate, and that branch is training too fast.

  # pre-process the ivector with fully connected layers; later we will combine                                                                              
  # it with the output of the CNN layers.                                                                                                                   
  # The learning-rate-factor >1.0 is because these layers will be instantiated                                                                              
  # once per sequence (for t==0), which will tend to make it train too slowly.                                                                              
  relu-renorm-layer name=ivector-1 input=ivector dim=200                                                                    
  relu-renorm-layer name=ivector-2 dim=200

  conv-batchnorm-layer name=cnn1 input=idct height-in=64 height-out=64 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=32                         
  conv-batchnorm-layer name=cnn2 height-in=64 height-out=64 time-offsets=-1,0,1 height-offsets=-1,0,1 num-filters-out=32      

  # the first TDNN layer:
   relu-renorm-layer input=Append(-1,0,1,ReplaceIndex(ivector-2, t, 0)) ....

It will be necessary to try a version of this experiment without the iVectors (you can re-use the egs by including the input node but just not using it, i.e. replacing relu-renorm-layer input=Append(-1,0,1,ReplaceIndex(ivector-2, t, 0))... with relu-renorm-layer input=Append(-1,0,1). The reason is that with this new way of processing ivectors, we will want to verify that they are still helping.

If this works, we may want to have a differently named config for the feature type, e.g. hires -> xhires, and have a different version of the run_ivector_common.sh (or different options to it). But for now, since we don't know if it will work, you can just edit the mfcc_hires.conf and rerun the feature extraction and ivector-extractor training.

GaofengCheng commented 7 years ago

I will help do, the results and tuning scripts will be made into PR under kaldi_52

danpovey commented 7 years ago

@GaofengCheng, I just realized that I made a really stupid mistake when I implemented the xconfig layers for convolution. I forgot to include any nonlinearity! There is no ReLU. I'll be modifying the xconfig code and the associated example. It should be done before you get to the CNN stage of these experiments.

GaofengCheng commented 7 years ago

@danpovey I have not yet begun the CNN exps. After your modification, I'll take this and help check

danpovey commented 7 years ago

OK, I have fixed the CNN layer, adding ReLU's. I'm running the (now-modified) WSJ example and it does lead to better objective functions.

On Sun, Apr 23, 2017 at 8:33 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey I have not yet begun the CNN exps. After your modification, I'll take this and help check

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296500501, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu7ychzT7oV08qP_0iAC7rwk9SzWdks5ry-3pgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey get it, I'll begin CNN today

danpovey commented 7 years ago

BTW, something to keep an eye on is, whether the 'test-mode' batch-norm is behaving the same as the training-mode in terms of log-prob. (we use test mode in the compute_prob logs and in the combination and in actual testing). In the current egs/wsj/s5/local/chain/tuning/run_cnn_tdnn_1a.sh, after training a while, the training and valid log-probs become not that good, and when I tried nnet3-chain-compute-prob with --test-mode=false they got better (almost a factor of 2). So test-mode of the BatchNorm component is not working as we expect it to.

In that configuration, though, I had converted all the layers to batch-norm, even the non-convolutional layers; and in those cases the batch-norm has to estimate many more individual scaling factors. So it's plausible that it's somehow giving information about the current minibatch. I also suspect some kind of bug in the batch-norm code, but I'm looking and I can't see one.

danpovey commented 7 years ago

I figured out the problem with the batch-norm. It's that during a single iteration the model was moving too far, so that the batch-norm stats accumulated over the entire iteration became non-representative. Part of the issue was that there was a bug in the reading and writing code, whereby the learning-rate and max-change was not being read or written (and also in the xconfig I had forgotten to set the max-change to 0.75, so I had not detected that issue). If the problem persists when the learning-rate and max-change code is fixed, I'll change how the stats for batch-norm are collected.

GaofengCheng commented 7 years ago

@danpovey How should I change cepstral-lifter if I use 64dim feature idct-layer name=idct input=input dim=40 cepstral-lifter=22 affine-transform-file=$dir/configs/idct.mat

danpovey commented 7 years ago

leave it the same

On Mon, Apr 24, 2017 at 10:40 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey How should I change cepstral-lifter if I use 64dim feature idct-layer name=idct input=input dim=40 cepstral-lifter=22 affine-transform-file=$dir/configs/idct.mat

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296875699, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu4DoH2azd9MM_GJfXCLqsNpVhCfxks5rzV0fgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey I did not find learning-rate-factor related to And remove the learning-rate-factor=2.0 on the ivector branch, I realized it's inappropriate, and that branch is training too fast in final.config when running cnn_tdnn_lstm or cnn_tdnn, just learning-rate-factor=5.0 on the xent_output

danpovey commented 7 years ago

It's OK, I may already have done that.

On Mon, Apr 24, 2017 at 11:33 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey I did not find learning-rate-factor related to And remove the learning-rate-factor=2.0 on the ivector branch, I realized it's inappropriate, and that branch is training too fast in final.config when running cnn_tdnn_lstm or cnn_tdnn, just learning-rate-factor=5.0 on the xent_output

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-296887869, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuyALqXnVXond1bLd82U-rDvetQM5ks5rzWmegaJpZM4NFP1w .

danpovey commented 7 years ago

You may notice that in in egs/wsj/s5/local/chain/tuning/run_cnn_tdnn_1b.sh, in the first convolutional layer I added: learning-rate-factor=0.333 max-change=0.25 to make it train slower, because I noticed the first convolutional layer was training too fast. You might want to do that, or at least look out for how fast it's training (grep for Relative in progress.*.log to see).

GaofengCheng commented 7 years ago

yes, I'm running a comparison exp under AMI IHM to figure out whether this will help.

GaofengCheng commented 7 years ago

@danpovey training setting --trainer.deriv-truncate-margin 8 \ will cause the cnn_tdnn_lstm to fail (I change directly from egs/ami/s5b/tuning/run_tdnn_lstm_1i.sh). I remove this option, will this affect the results a lot?

danpovey commented 7 years ago

That would be a bug in the code. I need to look into that. Can you please show me the error message, and if you think it would help, get me a stack trace?

On Tue, Apr 25, 2017 at 8:27 AM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey training setting --trainer.deriv-truncate-margin 8 \ will cause the cnn_tdnn_lstm to fail (I change directly from egs/ami/s5b/tuning/run_tdnn_lstm_1i.sh). I remove this option, will this affect the results a lot?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297013969, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu2WFiG5Exw_-VRrSDOBtyBAmEUjgks5rzeaigaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey

nnet3-am-copy --raw=true --learning-rate=0.002 exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl - 
LOG (nnet3-am-copy[5.1]:main():nnet3-am-copy.cc:140) Copied neural net from exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl to raw format as -
nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:- 
nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:- 
nnet3-chain-copy-egs --left-context=57 --right-context=27 --frame-shift=2 ark:/nobackup/datapool/project/proa/chenggaofeng/ami/s5b_ASRU2017/exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/egs/cegs.2.ark ark:- 
ASSERTION_FAILED (nnet3-chain-train[5.1]:FindNumLeadingAndTrailingIdenticals():nnet-optimize-utils.cc:2209) : 'ptr != end && "Vector consists entirely of -1's."' 

[ Stack-Trace: ]
nnet3-chain-train() [0x11829a6]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
nnet3-chain-train() [0xe069c8]
nnet3-chain-train() [0xe06b0b]
kaldi::nnet3::SnipRowOps(kaldi::nnet3::NnetComputation*)
kaldi::nnet3::Optimize(kaldi::nnet3::NnetOptimizeOptions const&, kaldi::nnet3::Nnet const&, int, kaldi::nnet3::NnetComputation*)
kaldi::nnet3::CachingOptimizingCompiler::CompileNoShortcut(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileViaShortcut(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::CachingOptimizingCompiler::Compile(kaldi::nnet3::ComputationRequest const&)
kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&)
main
__libc_start_main
nnet3-chain-train() [0xcdad99]

danpovey commented 7 years ago

OK, I have just pushed a code change that should resolve that-- please see if it's better.

On Tue, Apr 25, 2017 at 8:53 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey

nnet3-am-copy --raw=true --learning-rate=0.002 exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl - LOG (nnet3-am-copy[5.1]:main():nnet3-am-copy.cc:140) Copied neural net from exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/0.mdl to raw format as - nnet3-chain-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:- nnet3-chain-merge-egs --minibatch-size=32 ark:- ark:- nnet3-chain-copy-egs --left-context=57 --right-context=27 --frame-shift=2 ark:/nobackup/datapool/project/proa/chenggaofeng/ami/s5b_ASRU2017/exp/ihm/chain_cleaned/tdnn_lstm1i_cnn_64_fbank_2cnn_sp_bi_ld5/egs/cegs.2.ark ark:- ASSERTION_FAILED (nnet3-chain-train[5.1]:FindNumLeadingAndTrailingIdenticals():nnet-optimize-utils.cc:2209) : 'ptr != end && "Vector consists entirely of -1's."'

[ Stack-Trace: ] nnet3-chain-train() [0x11829a6] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::KaldiAssertFailure_(char const, char const, int, char const) nnet3-chain-train() [0xe069c8] nnet3-chain-train() [0xe06b0b] kaldi::nnet3::SnipRowOps(kaldi::nnet3::NnetComputation) kaldi::nnet3::Optimize(kaldi::nnet3::NnetOptimizeOptions const&, kaldi::nnet3::Nnet const&, int, kaldi::nnet3::NnetComputation) kaldi::nnet3::CachingOptimizingCompiler::CompileNoShortcut(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileViaShortcut(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileAndCache(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::CompileInternal(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::CachingOptimizingCompiler::Compile(kaldi::nnet3::ComputationRequest const&) kaldi::nnet3::NnetChainTrainer::Train(kaldi::nnet3::NnetChainExample const&) main __libc_start_main nnet3-chain-train() [0xcdad99]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297206674, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3Sl6-D209DEMDvXZan6ejEZ9e3wks5rzpWBgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

there is a mistake in previous comments, delete

danpovey commented 7 years ago

What was the mistake? I notice that your 'final' probs were much worse than the other probs, indicating that something went wrong in the combination stage. BTW, re-running the computeprob(train or valid).sh with --test-mode=false will indicate if the problem is unsuitable batch-norm stats. If the batch-norm stats are unsuitable, it will affect decoding.

Dan

On Wed, Apr 26, 2017 at 3:45 AM, Gaofeng Cheng notifications@github.com wrote:

there is a mistake in previous comments, delete

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297272941, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuwuwJAxElp7qY1DzoOVIriJ2f65Aks5rzvYIgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey OK, after several CNN experiments, I think there may be problems in batchnorm and combination. I do experiments: batchcnn+batchtdnn VS batchcnn(smallmax-change)+batchtdnn VS cnn+batchtdnn VS cnn(smallmax-change)+batchtdnn also batchcnn+batchtdnn+lstm VS batchcnn(smallmax-change)+batchtdnn+lstm VS cnn+batchtdnn+lstm VS cnn(smallmax-change)+batchtdnn+lstm take cnn+tdnn+lstm for example (AMI IHM):

Anyway, I will follow your idea, try test-mode false

danpovey commented 7 years ago

It's more complicated than just doing test-mode=false, there needs to be a command that refreshes the stats on the final.mdl before decoding.

Like one final iteration of training, but with zero learning rate, done after combination. That would do it. I'll try to make a PR.

On Wed, Apr 26, 2017 at 9:53 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey OK, after several CNN experiments, I think there may be problems in batchnorm and combination. I do experiments: batchcnn+batchtdnn VS batchcnn(smallmax-change)+batchtdnn VS cnn+batchtdnn VS cnn(smallmax-change)+batchtdnn also batchcnn+batchtdnn+lstm VS batchcnn(smallmax-change)+batchtdnn+lstm VS cnn+batchtdnn+lstm VS cnn(smallmax-change)+batchtdnn+lstm take cnn+tdnn+lstm for example (AMI IHM): [image: image] https://cloud.githubusercontent.com/assets/7532101/25464491/40de8342-2b2f-11e7-8771-a67185d61fe9.png

Anyway, I will follow your idea, try test-mode false

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297589345, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuy3X2OxX_Mhi7iHEGWBsYGYNPNopks5rz_UqgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

After your PR, will test-mode=false still be needed?

danpovey commented 7 years ago

test-mode=false will mostly just affect the diagnostics, I'm not sure whether I'll recommend to pass in that option; anyway we can change the default at the C++ level if needed.

On Wed, Apr 26, 2017 at 10:00 PM, Gaofeng Cheng notifications@github.com wrote:

After your PR, will test-mode=false still be needed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590203, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-Uk0_JQ8s2UxBDGazqUZQafJ7i1ks5rz_axgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem

danpovey commented 7 years ago

I don't really understand what you are asking.

On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .

danpovey commented 7 years ago

Gaofeng, within half an hour I will submit a PR. But I'm only doing the change for the nnet3, non-chain side, I will want you to replicate my changes in the 'chain' binaries before testing it. You can make a PR with your changes.

On Wed, Apr 26, 2017 at 10:11 PM, Daniel Povey dpovey@gmail.com wrote:

I don't really understand what you are asking.

On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .

danpovey commented 7 years ago

So what I do in my PR is, in the nnet3-combine code we use test-mode=false to get the combination weights; and when we have them, we get the batchnorm stats again, on the combination data. I believe all the lines of code that I am changing have 'chain' versions that need to be changed also.

On Wed, Apr 26, 2017 at 10:30 PM, Daniel Povey dpovey@gmail.com wrote:

Gaofeng, within half an hour I will submit a PR. But I'm only doing the change for the nnet3, non-chain side, I will want you to replicate my changes in the 'chain' binaries before testing it. You can make a PR with your changes.

On Wed, Apr 26, 2017 at 10:11 PM, Daniel Povey dpovey@gmail.com wrote:

I don't really understand what you are asking.

On Wed, Apr 26, 2017 at 10:05 PM, Gaofeng Cheng <notifications@github.com

wrote:

@danpovey https://github.com/danpovey I cannot understand , why model without batchnorm in cnn will perform such bad compared with their counterparts with batch in cnn, the accuracy for training iter looks good, though vibrating a lot(batchnorm in cnn make the training more stable). According to our guess, batchnorm stats lead to decoding problem. removing it in cnn should relieve this problem

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297590974, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-bTuh907IS5Lo_6UH8_Sl8PRXnbks5rz_fxgaJpZM4NFP1w .

GaofengCheng commented 7 years ago

@danpovey my doubt is why I just remove batch in cnn, the WER will change so much.....

danpovey commented 7 years ago

So maybe what you're saying that if you just remove batch-norm from CNN, the WER gets bad. I think CNNs tend to get very unstable without batchnorm, maybe you'd need much smaller learning rates.

On Wed, Apr 26, 2017 at 10:53 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey my doubt is why I just remove batch in cnn, the WER will change so much.....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1569#issuecomment-297597086, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuwjxy8u0TXz6vqBUsM4RkkrK23D6ks5r0AMdgaJpZM4NFP1w .

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

kaldi-asr / kaldi

CNN experiments #1569