kaituoxu / Listen-Attend-Spell

A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
200 stars 56 forks source link

decoding error after successful aishell train #10

Open MNCTTY opened 5 years ago

MNCTTY commented 5 years ago

Hi! I managed to train LAS on aishell data without errors. This is the end of the log:

Epoch 20 | Iter 441 | Average Loss 0.406 | Current Loss 0.505424 | 64.8 ms/batch
Epoch 20 | Iter 451 | Average Loss 0.409 | Current Loss 0.383116 | 64.1 ms/batch
-------------------------------------------------------------------------------------
Valid Summary | End of Epoch 20 | Time 956.81s | Valid Loss 0.410
-------------------------------------------------------------------------------------
Learning rate adjusted to: 0.000000
Find better validated model, saving to exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar
# Accounting: time=21312 threads=1
# Ended (code 0) at Fri Aug 30 17:15:39 MSK 2019, elapsed time 21312 seconds

but decoding stage gave an error:

Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-08-30 17:15:39,608 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
Traceback (most recent call last):
 File “/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/json2trn.py”, line 25, in <module>
   with open(args.json, ‘r’) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json’
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
|      SPKR        |         # Snt                   # Wrd         |      Corr              Sub              Del              Ins              Err            S.Err      |
|      Sum/Avg     |             0                       0         |       0.0              0.0              0.0              0.0              0.0              0.0      |

I don't understand why there are no some file in that directory. I thought everything that run.pl need are generated by themself there

KnowBetterHelps commented 5 years ago

maybe you should check file _"exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1ml100/data.json" ?

MNCTTY commented 5 years ago

yes, there are no such file as I said but as I understand, it should be generated at some stage as every other file in that directory it's not and I want to find why: in log all previous stages were with no errors

KnowBetterHelps commented 5 years ago

yeah, it should be generated when decoding started I am running the training proccessing now, and it will be finished tomorrow, I'll see it whether got the same problem.

MNCTTY commented 5 years ago

yep thanks

KnowBetterHelps commented 5 years ago

I found the same problem, but the main reason is not "the file non exist".

in my case, I found a encoding error in _"exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs128_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1ml100/decode.log"

I use "export PYTHONIOENCODING=UTF-8" to fixed it

MNCTTY commented 5 years ago

yep, we found encoding error earlier, and solved similar way so, after fixing encoding error I run all run.sh without any errors?

KnowBetterHelps commented 5 years ago

yes, I am waiting for decoding finished now. recognition result seems okay

MNCTTY commented 5 years ago

how did you run recognition without decoding stage from run.sh?

MNCTTY commented 5 years ago

ok. our news:

we are here SOMEHOW managed to run decoding stage for this we copied data.json from dump/test to folder, where data.json is not found by run.sh plus add encoding with utf-8 in several new places, plus change rec_token_id for token_id, because we thought that it was a typo.

and: stage 4 finally ran successfully and here are what id did say:


karina@karina:~/Listen-Attend-Spell/egs/aishell$ ./run.sh 
dictionary: data/lang_1char/train_chars.txt
Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch1_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-09-03 19:05:02,215 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch1_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
2019-09-03 19:05:02,218 (json2trn:28) INFO: reading data/lang_1char/train_chars.txt
2019-09-03 19:05:02,218 (json2trn:37) INFO: writing hyp trn to exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch1_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/hyp.trn
2019-09-03 19:05:02,218 (json2trn:38) INFO: writing ref trn to exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch1_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/ref.trn
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch1_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
|      SPKR                   |      # Snt            # Wrd       |      Corr              Sub              Del              Ins              Err            S.Err      |
|      Sum/Avg                |       419             26135       |     100.0              0.0              0.0              0.0              0.0              0.0      |

how it should be interpreted?

how to run recognition for a random new wav and where it write the recognised text?

do you have ANY idea why data.json doesn't being generated in decoding folder by itself?

I would be very grateful, if you could answer any of these questions

ps/ do you have any spaces in your language? :D

KnowBetterHelps commented 5 years ago

I guess: 1、it is not correct for you to copy test/data.json to exp/{...}/data.json. if you do so, the score script would compare test/data.json with exp/{...}/data.json, which are the same in your case. and the result would be 100% correct 2、how to run recognition for a random new wav and where it write the recognised text? you should prepare dump/test/deltatrue/data.json, whcih could be generated from your data dir. look into data preparation script 3、do you have ANY idea why data.json doesn't being generated in decoding folder by itself? maybe it is still the encoding problem 4、and BTW what do you mean "do you have any spaces in your language?“ : )

MNCTTY commented 5 years ago
  1. hmm i noticed, that aishell annotations looks like text without lots of spaces, I opened Chinese wiki and see thats there are some spaces, but only after points or commas so I ask because in russian we have lots of spaces, it separates words from each other and when data prep script delete all spaces russian annotations looks strange
MNCTTY commented 5 years ago

3, by the way, what do you mean saying 'encoding' - what stage in run.sh represents encoding? I thought there are only decoding - from wav to text. No?

KnowBetterHelps commented 5 years ago

the utterance for nnet training usually contains only one sentence, so there won't be any points or commas in it, if have, should be delete before training.

"encoding" means the language encoding type, like utf-8. when you use chinese, like opening some file which contains chinese, you should always be careful with the encoding.

MNCTTY commented 5 years ago

I managed to run decoding from start to end, the problem was really in encoding (it was need to be added to some other files) but! for some reason results are still 100% corr. I don't understand why is that

by the way, do you know, how to load a pretrained model to further training?

KnowBetterHelps commented 5 years ago

for some reason results are still 100% corr can you show me some example of you train/...../data.json?

how to load a pretrained model to further training I didn't find any train_stage like in kaldi process, so it might not support pre-training

MNCTTY commented 5 years ago

can you show me some example of you train/...../data.json?

you mean, dump/train/deltatrue/data.json ? here it is

and I attach at once dump/test/deltatrue/data.json and data.json from decoding folder, that generated in the process of decoding I renamed them train_data.json, test_data.json and decode_data.json for easy distinguishing in the attachement

Archive.zip

KnowBetterHelps commented 5 years ago

looks like you were using the script directly for your own data. рон не отрываясь смотрел на письмо которое уже начало с углов дымиться

in chinese, one syllbale can be one word, like "我"(one token), means "me"; but in your language, "рон" would be splited into "P O H" (three token), maybe you should modify the script to better understand your data, like one word one token (рон as one token).

MNCTTY commented 5 years ago

so, did I understood you correctly? You say that it's better to construct a vocab with lots of tokens that would be meaningful pieces of the language? not predict letters, but predict that pieces.

may be it makes sense, since I can take such vocab from bert for russian

can you say me, please, what files I should search for changes? I mean, if I just put new vocab in a place of old, situation didn't changed, yes?

KnowBetterHelps commented 5 years ago

I am doing some similar work for code-switch recognition, which in english I gona using subword 'BPE' unit, not a letter. for a example, catch --> ca tch , not 'c a t c h'

KnowBetterHelps commented 5 years ago

can you say me, please, what files I should search for changes the script in data preparation, specically in generating data.json

MNCTTY commented 5 years ago

I am doing some similar work for code-switch recognition, which in english I gona using subword 'BPE' unit, not a letter. for a example, catch --> ca tch , not 'c a t c h'

yeah, bert vocab using bpe exactly to construct vocab plus, there are huge complete vocabs for english, maybe you can use them since google had much more data to construct them for russian they are much smaller but still complete enough

MNCTTY commented 5 years ago

ok, I find the real problem of 100% correct results in result.txt

the problem was in json2trn.py file: there were creation of 2 absolute identical files - ref and hyp - in sourse code from decode data.json. But we know, that they must be different - hyp contains predictions of model, ref - things from test data.json I fixed it in my computer code - and result.txt now is correct (has no 100% correctness)

May be it should be fixed and in source code.

MNCTTY commented 5 years ago

okay i've done something wrong: now hyp.trn are being created empty one. Can somebody tell me what files beside json2trn.py are responsible for it's creation? please may be I will find out this tomorrow, but if someone already knows and answer in this time, it will be cool

KnowBetterHelps commented 5 years ago

it will create something like this, from exp/***/decode/data.json

hyn.trn 过 去 的 就 不 要 想 了 (T0055G2375-T0055G2375S0447) 天 气 下 降 注 意 身 体 (T0055G2286-T0055G2286S0457) 浦 中 市 剧 中 人 街 最 儿 我 独 醒 事 已 见 放 (T0055G0915-T0055G0915S0468)

ref.trn 过 去 的 就 不 要 想 了 (T0055G2375-T0055G2375S0447) 天 气 下 降 注 意 身 体 (T0055G2286-T0055G2286S0457 补 充 诗 句 众 人 皆 醉 而 我 独 醒 是 以 见 放 (T0055G0915-T0055G0915S0468)

MNCTTY commented 5 years ago

it's strange that though I have exp/***/decode/data.json , not empty, looks pretty correct, I still got empty hyp.trn but ref.trn is not an empty at all and looks correct one too

ben-8878 commented 4 years ago

@MNCTTY do u have solve your problem? I'm hesitant whether to use the tool