wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.08k stars 1.07k forks source link

WFST Decoding problem #1673

Closed NathanJHLee closed 8 months ago

NathanJHLee commented 1 year ago

Hi, my name is Nathan and I would like to implement wfst LM on CTC decoding. So I referred to run.sh located in 'librispeech/s0'

I set nbpe=8000 bpemode=unigram And also i ignore log "Failed to import k2 and icefall. ...." cause I thought it's optional

I would like to use wfst decoding by using TLG model. so i followed stage 7 from 'run.sh'

And also i ignore log "Failed to import k2 and icefall. ...." cause it optional

I think other logs are fine and it generated TLG model as log below

##stage 7 log dictionary: data/lang_char/train_960_unigram8000_units.txt unzip lm(3-gram.pruned.1e-7.arpa.gz)... Lm saved as data/local/lm/lm.arpa build lexicon... lexicon saved as 'data/local/dict/lexicon.txt' fstaddselfloops 'echo 8000 |' 'echo 200001 |' Lexicon and token FSTs compiling succeeded arpa2fst --read-symbol-table=data/lang_test/words.txt --keep-symbols=true - WARNING: Logging before InitGoogleLogging() is written to STDERR I0131 16:56:16.222822 418320 arpa-file-parser.cc:93] Reading \data\ section. I0131 16:56:16.222983 418320 arpa-file-parser.cc:148] Reading \1-grams: section. I0131 16:56:17.275573 418320 arpa-file-parser.cc:148] Reading \2-grams: section. I0131 16:56:31.037730 418320 arpa-file-parser.cc:148] Reading \3-grams: section. Checking how stochastic G is (the first of these numbers should be small): fstisstochastic data/lang_test/G.fst 2.66112 -0.298359 fsttablecompose data/lang_test/L.fst data/lang_test/G.fst fstdeterminizestar --use-log=true fstminimizeencoded fsttablecompose data/lang_test/T.fst data/lang_test/LG.fst Composing decoding graph TLG.fst succeeded

After checking TLG.fst generated, I use binary file to run as : ./decoder_main --rescoring_weight 1.0 --ctc_weight 0.5 --reverse_weight 0.0 --chunk_size -1 --wav_scp test-libri_10files.scp --model_path /stt/Models/wenet-main/examples/librispeech/s0/exp/sp_spec_aug_batch16/final.zip --unit_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/units.txt --fst_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/TLG.fst --beam 10.0 --dict_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/words.txt --lattice_beam 5 --max_active 7000 --min_active 200 --acoustic_scale 1.2 --blank_skip_thresh 0.98 --length_penalty 0.0 --result text_result

But, I get result 1089-134686-0000 y'd discriminate y'u yt yr yiks yt ys yt ys yt yt yiks yd yt yiks yr yr ysaye 1089-134686-0001 y'u y yn't yi 1089-134686-0002 y yih il y ys qu yih quee quee qu qu sorr yds 1089-134686-0003 yr y'd yr yn 1089-134686-0004 yn ys yi yi yn y'u yih ysaye 1089-134686-0005 ysaye yr ysaye yds yds y's yr demn yn yn yn yi's ysaye yi's 1089-134686-0006 qu yih quel yr y yn ysaye yr yn't qu yn yn yn bel ys quel yd yr yr ys yn yi 1089-134686-0007 yd yi yi thee yn't ys 1089-134686-0008 ys yn yih yr yn oui yt ys quel yn't yn ilg yiks 1089-134686-0009 ys y quels yn't yr y's quels y'd yr yih y y yn yr ys ysaye ysaye yt ysaye

Does TLG.fst have a problem? What do i have to check first? Please give me any clue. Thank you.

xingchensong commented 1 year ago

hi,u can first try decode without tlg to check which part is wrong,either tlg or model itself.

NathanJHLee commented 1 year ago

Hi I tried decode without tlg.

[asr1@k-atc12 bin]$ ./decoder_main --chunk_size -1 --wav_path /stt/DB/16kHz/english/LibriSpeech/test-clean/1089/134686/1089-134686-0000.wav --model_path /stt/Models/wenet-main/examples/librispeech/s0/exp/sp_spec_aug_batch16/final.zip --unit_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/units.txt test he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fat and sauce

Without TLG looks fine. But with TLG still has probelem.

[asr1@k-atc12 bin]$ ./decoder_main --rescoring_weight 1.0 --ctc_weight 0.5 --reverse_weight 0.0 --chunk_size -1 --wav_path /stt/DB/16kHz/english/LibriSpeech/test-clean/1089/134686/1089-134686-0000.wav --model_path /stt/Models/wenet-main/examples/librispeech/s0/exp/sp_spec_aug_batch16/final.zip --unit_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/units.txt --fst_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/TLG.fst --beam 10.0 --dict_path /stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/words.txt --lattice_beam 5 --max_active 7000 --min_active 200 --acoustic_scale 1.2 --blank_skip_thresh 0.98 --length_penalty 0.0 test y'd discriminate y'u yt yr yiks yt ys yt ys yt yt yiks yd yt yiks yr yr ysaye

I think TLG has problem. then what do i have to check first? Thank you.

xingchensong commented 1 year ago

Sorry, I have no idea, @pengzhendong any suggestions?

pengzhendong commented 1 year ago

You can refer https://github.com/pengzhendong/welm to build the TLG.

NathanJHLee commented 1 year ago

I changed nbpe 8000 to 5000 about run.sh I thought '3-gram.pruned.1e-7.arpa.gz' also built by spm based on 5000 nbpe. Is it right? ex) _text_result = spm(text_corpus) 3-gram_arpa = ngram-count(textresult , order=3)
So I just changed parameter and trained a new model again(stage 0 to 7) until 20'th epochs. But result still has problem like this.

test(normal) he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be laidled out in thick peppered flour fat and sauce test(TLG) zoe hr u u v n u n n u n ue yt n n u n ze ze l u n hc zee l u n ue

So I am wondering what is the matter of my case. My question are few things

  1. Your run.sh can make a correct inference If User normally follows "librispeech/s0/run.sh"? I just wanna know my fault or not.
  2. When i use 'https://github.com/pengzhendong/welm/blob/master/run.sh', jieba is necessary for English? lm(run.sh) needs jieba library in stage '2'. What if I don't use Chinese Word?

Thank you.

NathanJHLee commented 1 year ago

I checked wfst decoding pretrained models {20210610_u2pp_conformer_libtorch, 20210610_u2pp_conformer_exp}, it decodes result correctly.

I think there are mismatching problem between old and new training method.

./decoder_main --chunk_size -1 --wav_path /stt/DB/16kHz/english/LibriSpeech/test-clean/1089/134686/1089-134686-0000.wav --model_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/exp/20210610_u2pp_conformer_exp/final.zip --unit_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/data_pretrain/lang_test/units.txt test he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce

./decoder_main --chunk_size -1 --wav_path /stt/DB/16kHz/english/LibriSpeech/test-clean/1089/134686/1089-134686-0000.wav --model_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/exp/20210610_u2pp_conformer_exp/final.zip --unit_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/data_pretrain/lang_test/units.txt --fst_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/data_pretrain/lang_test/TLG.fst --beam 10.0 --dict_path /stt/Models/JHLEE/wenet-main/examples/librispeech/s0/data_pretrain/lang_test/words.txt test he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce

First, I will try to make arpa file. or Is there any way to avoid this issue?

Thank you

xingchensong commented 1 year ago

does

/stt/Models/wenet-main/examples/librispeech/s0/data/lang_test/TLG.fst

equal to

/stt/Models/JHLEE/wenet-main/examples/librispeech/s0/data_pretrain/lang_test/TLG.fst 

?

what if using pretrained_model & newly built TLG?

NathanJHLee commented 1 year ago

some bad result occurred while decoding your request as below.

./decoder_main --chunk_size -1 --wav_path /stt/DB/16kHz/english/LibriSpeech/test-clean/1089/134686/1089-134686-0000.wav --model_path /stt/Models/wenet-main/examples/librispeech/s0/exp/20210610_u2pp_conformer_exp/final.zip --unit_path /stt/Models/wenet-main/examples/librispeech/s0/data_tlg/lang_test/units.txt --fst_path /stt/Models/wenet-main/examples/librispeech/s0/data_tlg/lang_test/TLG.fst --beam 10.0 --dict_path /stt/Models/wenet-main/examples/librispeech/s0/data_tlg/lang_test/words.txt test hero humble thither worthy bath stock walking fourteen discern twice january safe among carefully rob safe among bruce elbow prayer safe among feminine named placed lieutenant inhabitants thorough perpetual forehead feminine says

'......../data_pretrain/lang_test/TLG.fst ' is built according to 'librispeech/s0/run.sh' //built based on your spm-model '......./data_tlg/lang_test/TLG.fst' is built according to 'librispeech/s0/run.sh' //built based on my own spm-model

So, They are not equal. I think decoding problem from this cause while following train scripts.

"20210610_u2pp_conformer_exp" which is shared as Pretrained-Model includes train_960_unigram5000.model this problem is occurred because This model and newly trained my own spm-model are different. So I will try to train a new pytorch-model by using spm-model you released and let you know decoding result later.

Thank you.

NathanJHLee commented 1 year ago

In my case It doesn't work in librispeech env So I refered 'aishell/s0/run.sh' on 7'th stage. Finally It works fine for me. Only different thing is making lm.arpa myself. Thank you for your help.

xingchensong commented 1 year ago

maybe there have some bugs in 7'th stage (librispeech)

github-actions[bot] commented 9 months ago

This issue has been automatically closed due to inactivity.