Dropout schedule in nnet3 training scripts

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

http://kaldi-asr.org

Other

14.11k stars 5.31k forks source link

Dropout schedule in nnet3 training scripts #1247

Closed danpovey closed 7 years ago

danpovey commented 7 years ago

Recently, @GaofengCheng has been doing some interesting experiments with dropout and BLSTMs, and getting nice improvements. He was using a dropout schedule in which you start with zero dropout, ramp up to 0.2, and then go back to zero at the very end.

I have been thinking about the best and most flexible way to support general dropout schedules in the training scripts. @vimalmanohar, since you are now the 'owner' of the python training scripts, it would be best if you take this on.

Here is my proposal.

Firstly, the --set-dropout-proportion (or whatever it is) option to nnet3*-copy is (or should be) deprecated. The way I want to do this is by adding an option to the '--edits-config' file. See ReadEditConfig() in nnet-utils.h. The option should have the following documentation in the comment there:

     set-dropout-proportion [name=<name-pattern>] proportion=<dropout-proportion>
        Sets the dropout rates for any components of type DropoutComponent whose
        names match the given <name-pattern> (e.g. lstm*).  <name-pattern> defaults to "*".

The documentation for the python-training-script option would read something like the following:

   parser.add_argument("--trainer.dropout-schedule", type=str, 
           dest='dropout_schedule', default='',
          help="""Use this to specify the dropout schedule.  You specify
        a piecewise linear function on the domain [0,1], where 0 is the start
        and 1 is the end of training; the function-argument (x) rises linearly with
        the amount of data you have seen, not iteration number (this improves
        invariance to num-jobs-{initial-final}).  E.g. '0,0.2,0' means 0 at the
        start; 0.2 after seeing half the data; and 0 at the end.  You may
        specify the x-value of selected points, e.g. '0,0.2@0.25,0' means
        that the 0.2 dropout-proportion is reached a quarter of the way through the
        data.   The start/end x-values are at x=0/x=1, and other unspecified x-values
        are interpolated between known x-values.  You may specify different rules
        for different component-name patterns using 'pattern1=func1 pattern2=func2',
        e.g. 'relu*=0,0.1,0 lstm*=0,0.2,0'.  More general should precede less general
       patterns, as they are applied sequentially.""")

I suggest to turn this into a command-line option to nnet3-copy or nnet3-am-copy that looks like the following, to avoid having to create lots of little config files:

--edits-config='echo "set-dropout-proportion name=lstm* proportion=0.113"; echo "set-dropout-proportion name=tdnn* proportion=0.575"|'

The double-quotes are just a bit of paranoia, to avoid bash globbing in case a file like 'name=lstmX' exists, but of course this does avoid some directory I/O. I'd be OK with placing the parsing of the option to the inner part of the python code even if this means it's done multiple times, if this helps keep the code structure clean; I don't think the time taken is significant in the overall scheme of things.

GaofengCheng commented 7 years ago

@danpovey @vimalmanohar I think adding this function into the existing dropout is interesting : supporting schedule [0, 0.2, 0]and[0, 1.0, 0] during one training. We could control the monotony of specific dropout components.

GaofengCheng commented 7 years ago

@danpovey could you give me some guidance on how to set a random matrix by row in kaldi? .... I saw a cumatrix function ApplyHeaviside, you could tell the function name you will use for realizing this funcion, and I can do it myslf.

danpovey commented 7 years ago

I assume what you want is a matrix where each row is randomly all zeros or all ones.

I would first set a random vector with dimension == the NumCols() of the matrix to random zeroes and ones using a combination of SetRandUniform(), Add() and ApplyHeaviside(). You can create a matrix with 1 row if the relevant functions are not available in class CuVector)... and then use CopyColsFromVec() to copy it to the matrix.

Dan

On Tue, Dec 6, 2016 at 5:46 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey could you give me some guidance on how to set a random matrix by row in kaldi? .... I saw a cumatrix function ApplyHeaviside, you could tell the function name you will use for realizing this funcion, and I can do it myslf.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-265331105, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu2qPkamevXkybp7B7mN2MOedmo4jks5rFhACgaJpZM4LDr60 .

danpovey commented 7 years ago

[note: to some extent this is a response to discussions that have been happening by email or on @vimalmanohar's repo.]

The basic situation is that @GaofengCheng has been doing a lot of experiments investigating how to do dropout in BLSTMs and different dropout schedules, and is getting some really nice improvements (around 1% absolute); and I believe his best current setup is based on just putting conventional dropout after the rp_t component (the component that combines the 'r' and 'p' matrices in projected LSTMs).... [BTW, @GaofengCheng, you might want to try putting it just on the 'r' or 'p' parts if you haven't tried that already... that may require a bit of messing around with dim-range components. It's possible to split things apart using dim-range nodes, and then append them back together using Apend].

I have been thinking about the best next-steps to take with regards to this dropout-schedule stuff, and getting it merged to master in the nicest way. I think @vimalmanohar should be in charge of this since he is kind of taking the lead on the nnet3 python-script maintenance and development. What I'm thinking is we could just use what we've learned from @GaofengCheng's experiments but (if Vimal feels it is best) modify the python code from a clean start if that is more conducive to getting things done fast. [Also, I think @GaofengCheng was using the pre-xconfig scripts, which we shouldn't be messing with at this point.] What I'm thinking, @vimalmanohar, is that we can give the various LSTM xconfig classes a string-valued component called 'dropout', defaulting to None, which you would set to 'rp' to do dropout as Gaofeng is currently recommending (i.e. on the output of the 'rp' component). We need to make sure this works in the new, 'fast' LSTM component as well as the old one. The use of a string-valued config will mean this is extensible to any new setup that Gaofeng comes up with.
Since we were not seeing great results for the 'whole-frame' dropout, let's not consider merging any of that just yet; we'll merge it to master if it turns out to give a benefit in some setup.

@vijayaditya, you may want to chime in if you disagree with this plan.

GaofengCheng commented 7 years ago

@danpovey I added the dropout on the input of 'rp' , i.e. before LSTM projection, .... but I can try on the output of rp right now and see the effect(this may better than on the input of rp, because the dropout effect will do directly on the LSTM gates)... @vimalmanohar as for the dropout place, you can ref lstm.py in https://github.com/vimalmanohar/kaldi/pull/8

danpovey commented 7 years ago

Oh OK, so I guess the dropout is on 'm_t', because that's where 'rp' gets its input projection from (and I think m_t is not used anywhere else). In the proposed scheme, this could be accomplished by setting 'dropout=m', and of course writing the appropriate code.

On Thu, Dec 22, 2016 at 5:03 PM, Gaofeng Cheng notifications@github.com wrote:

@danpovey https://github.com/danpovey I added the dropout on the input of 'rp' , i.e. before LSTM projection, .... but I can try on the output of rp right now and see the effect(this may better than on the input of rp, because the dropout effect will do directly on the LSTM gates)... @vimalmanohar https://github.com/vimalmanohar as for the dropout place, you can ref lstm.py in vimalmanohar#8 https://github.com/vimalmanohar/kaldi/pull/8

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-268923657, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3e9YgW2AKDNeur8AUpP_RV1E51hks5rKx3GgaJpZM4LDr60 .

GaofengCheng commented 7 years ago

@danpovey yes... input of lstm dropout is m_t

vimalmanohar commented 7 years ago

On Thu, Dec 22, 2016 at 7:30 PM Daniel Povey notifications@github.com wrote:

[note: to some extent this is a response to discussions that have been happening by email or on @vimalmanohar https://github.com/vimalmanohar's repo.]

The basic situation is that @GaofengCheng https://github.com/GaofengCheng has been doing a lot of experiments investigating how to do dropout in BLSTMs and different dropout schedules, and is getting some really nice improvements (around 1% absolute); and I believe his best current setup is based on just putting conventional dropout after the rp_t component (the component that combines the 'r' and 'p' matrices in projected LSTMs).... [BTW, @GaofengCheng https://github.com/GaofengCheng, you might want to try putting it just on the 'r' or 'p' parts if you haven't tried that already... that may require a bit of messing around with dim-range components. It's possible to split things apart using dim-range nodes, and then append them back together using Apend].

I have been thinking about the best next-steps to take with regards to this dropout-schedule stuff, and getting it merged to master in the nicest way. I think @vimalmanohar https://github.com/vimalmanohar should be in charge of this since he is kind of taking the lead on the nnet3 python-script maintenance and development. What I'm thinking is we could just use what we've learned from @GaofengCheng https://github.com/GaofengCheng's experiments but (if Vimal feels it is best) modify the python code from a clean start if that is more conducive to getting things done fast. [Also, I think @GaofengCheng https://github.com/GaofengCheng was using the pre-xconfig scripts, which we shouldn't be messing with at this point.] What I'm thinking, @vimalmanohar https://github.com/vimalmanohar, is that we can give the various LSTM xconfig classes a string-valued component called 'dropout', defaulting to None, which you would set to 'rp' to do dropout as Gaofeng is currently recommending (i.e. on the output of the 'rp' component). We need to make sure this works in the new, 'fast' LSTM component as well as the old one.

Seems reasonable. @GaofengCheng can add this to his PR after he tests out the fast LSTM component. I can help with the xconfig modifications if needed.

The use of a string-valued config will mean this is extensible to any new setup that Gaofeng comes up with. Since we were not seeing great results for the 'whole-frame' dropout, let's not consider merging any of that just yet; we'll merge it to master if it turns out to give a benefit in some setup.

@vijayaditya https://github.com/vijayaditya, you may want to chime in if you disagree with this plan.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-268920615, or mute the thread https://github.com/notifications/unsubscribe-auth/AEATV1juMr1UZp7kPc-mzeDejLLqWk8mks5rKxYJgaJpZM4LDr60 .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

13265170340 commented 5 years ago

Recently, @GaofengCheng has been doing some interesting experiments with dropout and BLSTMs, and getting nice improvements. He was using a dropout schedule in which you start with zero dropout, ramp up to 0.2, and then go back to zero at the very end.

I have been thinking about the best and most flexible way to support general dropout schedules in the training scripts. @vimalmanohar, since you are now the 'owner' of the python training scripts, it would be best if you take this on.

Here is my proposal.

Firstly, the --set-dropout-proportion (or whatever it is) option to nnet3*-copy is (or should be) deprecated. The way I want to do this is by adding an option to the '--edits-config' file. See ReadEditConfig() in nnet-utils.h. The option should have the following documentation in the comment there:
     set-dropout-proportion [name=<name-pattern>] proportion=<dropout-proportion>
        Sets the dropout rates for any components of type DropoutComponent whose
        names match the given <name-pattern> (e.g. lstm*).  <name-pattern> defaults to "*".
The documentation for the python-training-script option would read something like the following:
   parser.add_argument("--trainer.dropout-schedule", type=str, 
           dest='dropout_schedule', default='',
          help="""Use this to specify the dropout schedule.  You specify
        a piecewise linear function on the domain [0,1], where 0 is the start
        and 1 is the end of training; the function-argument (x) rises linearly with
        the amount of data you have seen, not iteration number (this improves
        invariance to num-jobs-{initial-final}).  E.g. '0,0.2,0' means 0 at the
        start; 0.2 after seeing half the data; and 0 at the end.  You may
        specify the x-value of selected points, e.g. '0,0.2@0.25,0' means
        that the 0.2 dropout-proportion is reached a quarter of the way through the
        data.   The start/end x-values are at x=0/x=1, and other unspecified x-values
        are interpolated between known x-values.  You may specify different rules
        for different component-name patterns using 'pattern1=func1 pattern2=func2',
        e.g. 'relu*=0,0.1,0 lstm*=0,0.2,0'.  More general should precede less general
       patterns, as they are applied sequentially.""")
I suggest to turn this into a command-line option to nnet3-copy or nnet3-am-copy that looks like the following, to avoid having to create lots of little config files:
--edits-config='echo "set-dropout-proportion name=lstm* proportion=0.113"; echo "set-dropout-proportion name=tdnn* proportion=0.575"|'
The double-quotes are just a bit of paranoia, to avoid bash globbing in case a file like 'name=lstmX' exists, but of course this does avoid some directory I/O. I'd be OK with placing the parsing of the option to the inner part of the python code even if this means it's done multiple times, if this helps keep the code structure clean; I don't think the time taken is significant in the overall scheme of things.

How to add dropout module in TDNN script

GaofengCheng commented 5 years ago

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

13265170340 commented 5 years ago

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

Thank you

13265170340 commented 5 years ago

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

local/nnet3/run_tdnn3.sh: creating neural net configs tree-info exp/tri5a_sp_ali/tree steps/nnet3/xconfig_to_configs.py --xconfig-file exp/nnet3/tdnn_sp_2/configs/network.xconfig --config-dir exp/nnet3/tdnn_sp_2/configs/ ERROR:root:Exception caught while parsing the following xconfig line: relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.004 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim- continuous=true dim=850

Traceback (most recent call last): File "steps/nnet3/xconfig_to_configs.py", line 333, in main() File "steps/nnet3/xconfig_to_configs.py", line 323, in main all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 189, in read_xconfig_file this_layer = xconfig_line_to_object(line, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 96, in xconfig_line_to_object return config_to_layer[first_token](first_token, key_to_value, prev_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 706, in init XconfigLayerBase.init(self, first_token, key_to_value, prev_names) File "steps/libs/nnet3/xconfig/basic_layers.py", line 68, in init self.set_configs(key_to_value, all_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 97, in set_configs "" .format(key, value, self.layer_type, configs)) RuntimeError: Configuration value continuous=true was not expected in layer of type relu-batchnorm-dropout-layer; allowed configs with their defaults: self-repair-scale->1e-05 l2-regularize->"" add-log-stddev->False ng-linear-options->"" bias-stddev->"" bottleneck-dim->-1 dropout-per-dim->False dim->-1 max-change->0.75 ng-affine-options->"" learning-rate-factor->"" dropout-per-dim-continuous->False input->"[-1]" dropout-proportion->0.5 target-rms->1.0

Xconfig error adding new layer on TDNN model

danpovey commented 5 years ago

Looks like you added a space between dropout-per-dim- and continous.

On Thu, Nov 1, 2018 at 8:53 AM xiaowang notifications@github.com wrote:

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

local/nnet3/run_tdnn3.sh: creating neural net configs tree-info exp/tri5a_sp_ali/tree steps/nnet3/xconfig_to_configs.py --xconfig-file exp/nnet3/tdnn_sp_2/configs/network.xconfig --config-dir exp/nnet3/tdnn_sp_2/configs/ ERROR:root:Exception caught while parsing the following xconfig line: relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.004 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim- continuous=true dim=850

Traceback (most recent call last): File "steps/nnet3/xconfig_to_configs.py", line 333, in main() File "steps/nnet3/xconfig_to_configs.py", line 323, in main all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 189, in read_xconfig_file this_layer = xconfig_line_to_object(line, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 96, in xconfig_line_to_object return config_to_layer[first_token](first_token, key_to_value, prev_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 706, in init XconfigLayerBase.init(self, first_token, key_to_value, prev_names) File "steps/libs/nnet3/xconfig/basic_layers.py", line 68, in init self.set_configs(key_to_value, all_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 97, in set_configs "" .format(key, value, self.layer_type, configs)) RuntimeError: Configuration value continuous=true was not expected in layer of type relu-batchnorm-dropout-layer; allowed configs with their defaults: self-repair-scale->1e-05 l2-regularize->"" add-log-stddev->False ng-linear-options->"" bias-stddev->"" bottleneck-dim->-1 dropout-per-dim->False dim->-1 max-change->0.75 ng-affine-options->"" learning-rate-factor->"" dropout-per-dim-continuous->False input->"[-1]" dropout-proportion->0.5 target-rms->1.0

Xconfig error adding new layer on TDNN model

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-435031717, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu1SluBwSgaIb--5mY7K8PVuTeORGks5uqu7egaJpZM4LDr60 .

13265170340 commented 5 years ago

steps/nnet3/decode.sh --nj 40 --cmd run.pl --online-ivector-dir exp/nnet3/ivectors_dev exp/tri5a/graph data/dev_hires exp/nnet3/tdnn_sp_2/decode_dev
steps/nnet3/decode.sh: feature type is raw
bash: line 1: 46146 Segmentation fault      (core dumped) ( nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/34/utt2spk scp:data/dev_hires/split40/34/cmvn.scp scp:data/dev_hires/split40/34/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.34.gz" ) 2>> exp/nnet3/tdnn_sp_2/decode_dev/log/decode.34.log >> exp/nnet3/tdnn_sp_2/decode_dev/log/decode.34.log

LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 13 orphan nodes.
LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 20 orphan components.
LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:Collapse():nnet-utils.cc:1378) Added 7 components, removed 20
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/1/utt2spk scp:data/dev_hires/split40/1/cmvn.scp scp:data/dev_hires/split40/1/feats.scp ark:-

Thank you, the previous problem has been solved.However, there is a problem with decoding.

danpovey commented 5 years ago

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.

gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

13265170340 commented 5 years ago

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.
gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

yuyin@yuyin-Super-Server:~/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb)

Did not solve the problem, gdb will not use

jtrmal commented 5 years ago

when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the output of that command -- that is what dan is looking for. y.

On Fri, Nov 2, 2018 at 9:00 AM xiaowang notifications@github.com wrote:

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.

gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n..... (gdb) r ... (gdb) bt

yuyin@yuyin-Super-Server:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c

exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb)

Did not solve the problem, gdb will not use

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-435371368, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60 .

13265170340 commented 5 years ago

yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc g++: error: nnet3-latgen-faster.cc: 没有那个文件或目录 g++: fatal error: no input files

Does GDB support shell scripts?

when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the output of that command -- that is what dan is looking for. y. … On Fri, Nov 2, 2018 at 9:00 AM xiaowang @.> wrote: I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g. gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n..... (gdb) r ... (gdb) bt @.Super-Server:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb) Did not solve the problem, gdb will not use — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60 .

when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the output of that command -- that is what dan is looking for. y. … On Fri, Nov 2, 2018 at 9:00 AM xiaowang @.> wrote: I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g. gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n..... (gdb) r ... (gdb) bt @.Super-Server:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb) Did not solve the problem, gdb will not use — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60 .

yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc g++: error: nnet3-latgen-faster.cc: 没有那个文件或目录 g++: fatal error: no input files

Does GDB support shell scripts?

jtrmal commented 5 years ago

I think you are confusing g++ and gdb.

13265170340 commented 5 years ago

I think you are confusing g++ and gdb.

I know dan, but I won't use gdb.

13265170340 commented 5 years ago

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.
gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

yuyin@yuyin-Super-Server:~/kaldi-trunk1/src/nnet3bin$ gdb nnet3-latgen-faster GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from nnet3-latgen-faster...done.

(gdb) r --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" Starting program: /home/yuyin/kaldi-trunk1/src/nnet3bin/nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". /home/yuyin/kaldi-trunk1/src/nnet3bin/nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |' 'ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz' ERROR (nnet3-latgen-faster[5.5.88~3-8e30f]:Input():kaldi-io.cc:756) Error opening input stream exp/nnet3/tdnn_sp_2/final.mdl

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::FatalMessageLogger::~FatalMessageLogger() kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) main __libc_start_main _start

ERROR (nnet3-latgen-faster[5.5.88~3-8e30f]:Input():kaldi-io.cc:756) Error opening input stream exp/nnet3/tdnn_sp_2/final.mdl

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::FatalMessageLogger::~FatalMessageLogger() kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool) main __libc_start_main _start

13265170340 commented 5 years ago

Is the training file final.mdl wrong?

jtrmal commented 5 years ago

you are running it from a different directory, probably

danpovey commented 5 years ago

Please get someone local to help you. We are busy and we don't have time to deal with people who don't know basic things like how to use a debugger, and there must be people in your lab who know this stuff.

13265170340 commented 5 years ago

thank you.The problem has been solved because the previous model is not updated

13265170340 commented 5 years ago

I want to ask which papers are used in the dropout algorithm on kaldi.

danpovey commented 5 years ago

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout.

Dan

On Thu, Nov 8, 2018 at 8:50 PM xiaowang notifications@github.com wrote:

I want to ask which papers are used in the dropout algorithm on kaldi.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1247#issuecomment-437221893, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LDr60 .

13265170340 commented 5 years ago

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout. Dan … On Thu, Nov 8, 2018 at 8:50 PM xiaowang @.***> wrote: I want to ask which papers are used in the dropout algorithm on kaldi. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LD

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout. Dan … On Thu, Nov 8, 2018 at 8:50 PM xiaowang @.***> wrote: I want to ask which papers are used in the dropout algorithm on kaldi. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LDr60 .

Yes, about TDNN.