Open martiansideofthemoon opened 6 years ago
Kalpesh,
Ramon would know best about the “v1-tf” recipe, but I can see that there is an error message that says "Can't open data/local/dict_phn/lexicon1.txt: No such file or directory at local/swbd1_map_words.pl line 26.”, which shows that you did not run the “phn” recipe before running the “char” recipe. You need to do this, so that both of them use the same vocabulary. Next, you can configure the location of the temp folder in path.sh, and you want to change it to “/tmp” or something, if you don’t have “/scratch”, which is the default in our cluster. There is also "exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl: line 135: syntax error: unexpected end of file” - which means maybe the training didn’t start correctly, or not at all?
Let me know if you have any other questions!
Florian
On Jan 28, 2018, at 7:07 AM, Kalpesh Krishna notifications@github.com wrote:
Hello, I am trying to run the TensorFlow based EESEN setup for Switchboard. More specifically, I am using the tf_clean branch and trying to run the asr_egs/swbd/v1-tf/run_ctc_char.sh script. I am having some trouble with the training and decoding steps, would appreciate your help! @ramonsanabria https://github.com/ramonsanabria , @fmetze https://github.com/fmetze During the stage 3 (training), I get a number of error messages of the form -
Warning: sw02018-B_012508-012721 has not been found in labels file: /scratch/tmp.1hi5uR4EIR/labels.cv
Here are the training logs that follow. I suspect creating tr_y from scratch is a problem?
cleaning done: /scratch/tmp.1hi5uR4EIR/cv_local.scp original scp length: 4000 scp deleted: 270 final scp length: 3730 number of labels not found: 270 TRAINING STARTS [2018-Jan-28 06:02:05]
2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] ('now:', 'Sun 2018-01-28 06:02:08') ('tf:', '1.1.0') ('cwd:', '/share/data/lang/users/kalpesh/eesen/asr_egs/swbd/v1-tf') ('library:', '/share/data/lang/users/kalpesh/eesen') ('git:', 'heads/master-dirty')
reading training set
tr_x:
non augmented (mix) training set found for language: no_name_language ...
preparing dictionary for no_name_language...
ordering all languages (from scratch) train batches...
Augmenting data x3 and win 3...
tr_y:
creating tr_y from scratch... unilanguage setup detected (in labels)...
cv_x:
unilingual set up detected on test or set language...
cv (feats) found for language: no_name_language ...
preparing dictionary for no_name_language...
ordering all languages (from scratch) cv batches...
Augmenting data x3 and win 3...
cv_y:
creating cv_y from scratch... unilanguage setup detected (in labels)...
languages checked ... (cv_x vs cv_y vs tr_x vs tr_y) Finally here are my decoding logs -
(python2.7_tf1.4) kalpesh@kalpesh:v1-tf$ ./run_ctc_char.sh
Decoding eval200 using AM
===================================================================== ./steps/decode_ctc_am_tf.sh --config exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl --data ./data/eval2000/ --weights exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/epoch25.ckpt --results exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/results/epoch25 exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl: line 135: syntax error: unexpected end of file copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:./data/eval2000//utt2spk scp:./data/eval2000//cmvn.scp scp:./data/eval2000//feats.scp ark:- |' ark,scp:/scratch/tmp.GgS1if0Wex/f.ark,/scratch/tmp.GgS1if0Wex/test_local.scp apply-cmvn --norm-vars=true --utt2spk=ark:./data/eval2000//utt2spk scp:./data/eval2000//cmvn.scp scp:./data/eval2000//feats.scp ark:- LOG (apply-cmvn[5.3.85~1-35950]:main():apply-cmvn.cc:159) Applied cepstral mean and variance normalization to 4458 utterances, errors on 0 LOG (copy-feats[5.3.85~1-35950]:main():copy-feats.cc:143) Copied 4458 feature matrices.
2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] ('now:', 'Sun 2018-01-28 06:05:28') ('tf:', '1.1.0') ('cwd:', '/share/data/lang/users/kalpesh/eesen/asr_egs/swbd/v1-tf') ('library:', '/share/data/lang/users/kalpesh/eesen') ('git:', 'heads/master-dirty')
reading testing set
test_x:
unilingual set up detected on test or set language...
test (feats) found for language: no_name_language ...
preparing dictionary for no_name_language...
ordering all languages (from scratch) test batches...
Augmenting data x3 and win 3...
test_y (for ter computation):
unilanguage setup detected (in labels)...
no label files fins in /scratch/tmp.GgS1if0Wex with info_set: test file: /share/data/lang/users/kalpesh/eesen/tf/ctc-am/reader/labels_reader/labels_reader.py function: __read_one_language line: 171 exiting... Here are my logs from the first two stages (data preparation, fbank generation)
(python2.7_tf1.4) kalpesh@kalpesh:v1-tf$ ./run_ctc_char.sh
Data Preparation
===================================================================== Switchboard-1 data preparation succeeded. utils/fix_data_dir.sh: filtered data/train/segments from 264333 to 264072 lines based on filter /scratch/tmp.V26jBobg4D/recordings. utils/fix_data_dir.sh: filtered /scratch/tmp.V26jBobg4D/speakers from 4876 to 4870 lines based on filter data/train/cmvn.scp. utils/fix_data_dir.sh: filtered data/train/spk2utt from 4876 to 4870 lines based on filter /scratch/tmp.V26jBobg4D/speakers. fix_data_dir.sh: kept 263890 utterances out of 264072 fix_data_dir.sh: old files are kept in data/train/.backup Can't open data/local/dict_phn/lexicon1.txt: No such file or directory at local/swbd1_map_words.pl line 26. Character-based dictionary (word spelling) preparation succeeded Warning: for utterances en_4910-B_013563-013763 and en_4910-B_013594-013790, segments already overlap; leaving these times unchanged. Warning: for utterances en_4910-B_025539-025791 and en_4910-B_025541-025674, segments already overlap; leaving these times unchanged. Warning: for utterances en_4910-B_032263-032658 and en_4910-B_032299-032406, segments already overlap; leaving these times unchanged. Warning: for utterances en_4910-B_035678-035757 and en_4910-B_035715-035865, segments already overlap; leaving these times unchanged. Data preparation and formatting completed for Eval 2000 (but not MFCC extraction) fix_data_dir.sh: kept 4458 utterances out of 4466 fix_data_dir.sh: old files are kept in data/eval2000/.backup
FBank Feature Generation
===================================================================== steps/make_fbank.sh --cmd run.pl --nj 32 data/train exp/make_fbank_pitch/train fbank_pitch steps/make_fbank.sh: moving data/train/feats.scp to data/train/.backup utils/validate_data_dir.sh: Successfully validated data-directory data/train steps/make_fbank.sh [info]: segments file exists: using that. Succeeded creating filterbank features for train steps/compute_cmvn_stats.sh data/train exp/make_fbank_pitch/train fbank_pitch Succeeded creating CMVN stats for train fix_data_dir.sh: kept all 263890 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/make_fbank.sh --cmd run.pl --nj 10 data/eval2000 exp/make_fbank_pitch/eval2000 fbank_pitch steps/make_fbank.sh: moving data/eval2000/feats.scp to data/eval2000/.backup utils/validate_data_dir.sh: Successfully validated data-directory data/eval2000 steps/make_fbank.sh [info]: segments file exists: using that. Succeeded creating filterbank features for eval2000 steps/compute_cmvn_stats.sh data/eval2000 exp/make_fbank_pitch/eval2000 fbank_pitch Succeeded creating CMVN stats for eval2000 fix_data_dir.sh: kept all 4458 utterances. fix_data_dir.sh: old files are kept in data/eval2000/.backup utils/subset_data_dir.sh: reducing #utt from 263890 to 4000 utils/subset_data_dir.sh: reducing #utt from 263890 to 259890 utils/subset_data_dir.sh: reducing #utt from 259890 to 100000 Reduced number of utterances from 100000 to 76615 Using fix_data_dir.sh to reconcile the other files. fix_data_dir.sh: kept 76615 utterances out of 100000 fix_data_dir.sh: old files are kept in data/train_100k_nodup/.backup Reduced number of utterances from 259890 to 192701 Using fix_data_dir.sh to reconcile the other files. fix_data_dir.sh: kept 192701 utterances out of 259890 fix_data_dir.sh: old files are kept in data/train_nodup/.backup — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/169, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8cGE0tSaYt7E-UWyF9etI7jwKl0Mks5tPGLqgaJpZM4Rvq8j.
Hi @fmetze ,
"Can't open data/local/dict_phn/lexicon1.txt: No such file or directory at local/swbd1_map_words.pl line 26.”
Yes, I hadn't run the ph
recipe. This error disappears on doing this. Do I need to run a decoding with the ph
recipe too?
There is also "exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl: line 135: syntax error: unexpected end of file”
This is an irrelevant error, it's happening since a pickle configuration file is sourced in the utils/parse_options.sh
script. It does not affect further execution.
which means maybe the training didn’t start correctly, or not at all?
The training did happen successfully. Here are the training logs. As a confirmation, is it usual for the Kaldi setup to discard 270 dev utterances, 11 eval2000 utterances and 973 train utterances due to transcripts like [vocalized-noise]
?
for language: no_name_language
following variables will be optimized:
--------------------------------------------------------------------------------
<tf.Variable 'cudnn_lstm/params:0' shape=<unknown> dtype=float32_ref>
<tf.Variable 'output_layers/output_fc_no_name_language_no_target_name/weights:0' shape=(640, 42) dtype=float32_ref>
<tf.Variable 'output_layers/output_fc_no_name_language_no_target_name/biases:0' shape=(42,) dtype=float32_ref>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
[2018-02-01 11:31:59] Epoch 1 starting, learning rate: 0.03
[2018-02-01 12:23:40] Epoch 1 finished in 52 minutes
Train cost: 86.2, ter: 35.6%, #example: 491721
Validate cost: 45.4, ter: 24.9%, #example: 11190
('not updating learning rate, parameters', 8, 0.0005)
--------------------------------------------------------------------------------
....
....
[2018-02-02 07:10:09] Epoch 23 starting, learning rate: 0.0005
[2018-02-02 08:05:53] Epoch 23 finished in 56 minutes
Train cost: 8.1, ter: 3.4%, #example: 491721
Validate cost: 37.9, ter: 15.3%, #example: 11190
('not updating learning rate, parameters', 8, 0.0005)
--------------------------------------------------------------------------------
However, the decoding does not seem to budge. Here are the logs. The suspicious lines seem to be no label files fins in /scratch/tmp.jihiXHPJkp with info_set: test
and no_name_language
. One important point here is that I am starting the bash script directy from the decoding stage (stage 4). Is it necessary to re-run stage 1 or 2 after I have a trained model?
=====================================================================
Decoding eval200 using AM
=====================================================================
./steps/decode_ctc_am_tf.sh --config exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl --data ./data/eval2000/ --weights exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/epoch14.ckpt --results exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/results/epoch14
exp/train_char_l4_c320_mdeepbilstm_w3_nfalse/model/config.pkl: line 135: syntax error: unexpected end of file
copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:./data/eval2000//utt2spk scp:./data/eval2000//cmvn.scp scp:./data/eval2000//feats.scp ark:- |' ark,scp:/scratch/tmp.jihiXHPJkp/f.ark,/scratch/tmp.jihiXHPJkp/test_local.scp
apply-cmvn --norm-vars=true --utt2spk=ark:./data/eval2000//utt2spk scp:./data/eval2000//cmvn.scp scp:./data/eval2000//feats.scp ark:-
LOG (apply-cmvn[5.3.85~1-35950]:main():apply-cmvn.cc:159) Applied cepstral mean and variance normalization to 4458 utterances, errors on 0
LOG (copy-feats[5.3.85~1-35950]:main():copy-feats.cc:143) Copied 4458 feature matrices.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
2.7.14 |Anaconda, Inc.| (default, Dec 7 2017, 17:05:42)
[GCC 7.2.0]
('now:', 'Fri 2018-02-02 09:13:09')
('tf:', '1.4.0-rc1')
('cwd:', '/share/data/lang/users/kalpesh/eesen/asr_egs/swbd/v1-tf')
('library:', '/share/data/lang/users/kalpesh/eesen')
('git:', 'heads/master-dirty')
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
reading testing set
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
test_x:
--------------------------------------------------------------------------------
unilingual set up detected on test or set language...
test (feats) found for language: no_name_language ...
preparing dictionary for no_name_language...
ordering all languages (from scratch) test batches...
Augmenting data x3 and win 3...
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
test_y (for ter computation):
--------------------------------------------------------------------------------
unilanguage setup detected (in labels)...
no label files fins in /scratch/tmp.jihiXHPJkp with info_set: test
file: /share/data/lang/users/kalpesh/eesen/tf/ctc-am/reader/labels_reader/labels_reader.py function: __read_one_language line: 171
exiting...
Good, not sure about the pickle error, but if you say it does not affect the training, then things should be fine. You should be fine running the test script from stage 4 only for decoding, the data should already be prepared. @ramonsanabria - any ideas about v1-tf here?
Hi,
The pickle error is irrelevant. The configuration is loaded properly. I will try to remove it as soon as I have time.
@xinjli is cleaning up the swbd recipie. I have some experiments with different char-based units (removing numbers and noises) that for now seems to be improving a bit.
I also found the issue that char recipe could not run without the phn recipe today. The same issue also happens in the swbd v1 recipe under the master branch. I will prepare a fix for this issue.
Hi @ramonsanabria , @xinjli
Any idea about the no label files fins in /scratch/tmp.jihiXHPJkp with info_set: test
error I am receiving?
can you do: find /scratch/tmp.jihiXHPJkp ?
2018-02-07 2:02 GMT-05:00 Kalpesh Krishna notifications@github.com:
Hi @ramonsanabria https://github.com/ramonsanabria , @xinjli https://github.com/xinjli Any idea about the no label files fins in /scratch/tmp.jihiXHPJkp with info_set: test error I am receiving?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/169#issuecomment-363676356, or mute the thread https://github.com/notifications/unsubscribe-auth/AMlwPVkPG-g1fkUMvbZjT0gQE2jwnL93ks5tSUqcgaJpZM4Rvq8j .
@ramonsanabria yes, I can find it.
kalpesh@kalpesh:kalpesh$ ls /scratch/tmp.jihiXHPJkp
f.ark test_local.scp
kalpesh@kalpesh:kalpesh$
I checked the code, the system searches for a file named labels.test
, but fails to find it. I tried to use the ./local/swbd1_prepare_phn_dict_tf.py
script to generate the test labels (like in the case of training data), but I obtain an empty labels file. I used to use a special hubscr.pl
script to generate a detailed output using the raw decoded transcripts (in my previous setup).
What is the correct way to integrate this script into EESEN?
I think we need a stage to generate labels.test for testing. It seems that we do not have any script for this now.
I think we need something like
python ./local/swbd1_prepare_char_dict_tf.py --text_file ./data/train_nodup/text --input_units ./data/local/dict_char/units.txt --output_labels $dir_am/labels.tr --lower_case --ignore_noises || exit 1
After preparing labels.tr and labels.cv
Probably we can use following code to generate labels.test
python ./local/swbd1_prepare_char_dict_tf.py --text_file ./data/eval2000/text --input_units ./data/local/dict_char/units.txt --output_labels $dir_am/labels.test
eval2000 contains the text we need for evaluation and just replace $dir_am with variable in your environment
You have:
https://github.com/srvk/eesen/blob/tf_clean/asr_egs/swbd/v1-tf/local/swbd1_prepare_char_dict_tf.py
This script can generate the units.txt. If you put --output_units it will produce the units that you will further use (presumably this will be with you train text). Then, the units produced by this script will be used as --input_units to generate the labels.cv or labels.test.
Not sure which version is there. But I performed some cleaning of swbd that we should discuss.
2018-02-07 16:37 GMT-05:00 Xinjian Li notifications@github.com:
Probably we can use following code to generate labels.test
python ./local/swbd1_prepare_char_dict_tf.py --text_file ./data/eval2000/text --input_units ./data/local/dict_char/units.txt --output_labels $dir_am/labels.test
eval2000 contains the text we need for evaluation and just replace $dir_am with variable in your environment
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/169#issuecomment-363919165, or mute the thread https://github.com/notifications/unsubscribe-auth/AMlwPZjhYxpTHXp48W9oskoSqT3eX1lXks5tShengaJpZM4Rvq8j .
@ramonsanabria could you describe the process you are using the compute the final WER of a trained model?
I guess this is often called "scoring", in the Kaldi setup. Generally, raw transcripts are fed into hubscr.pl
to generate a number of detailed output files, with the final SWBD, CH, Combined WER mentioned in a *.lur
file.
Hi @ramonsanabria any update on ^? Also, how have you treated the space character? I cannot find an entry for the space in data/local/dict_char/units.txt
. (Note I'm referring to the <space>
character, not the CTC blank symbol).
Hello, I am trying to run the TensorFlow based EESEN setup for Switchboard. More specifically, I am using the
tf_clean
branch and trying to run theasr_egs/swbd/v1-tf/run_ctc_char.sh
script. I am having some trouble with the training and decoding steps, would appreciate your help! @ramonsanabria , @fmetzeDuring the stage 3 (training), I get a number of error messages of the form -
Here are the training logs that follow. I suspect
creating tr_y from scratch
is a problem?Finally here are my decoding logs -
Here are my logs from the first two stages (data preparation, fbank generation)