k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
842 stars 273 forks source link

LODR RNNLM rescoring requirements #749

Open ngoel17 opened 1 year ago

ngoel17 commented 1 year ago

I am trying to understand the requirements on the RNNLM for LODR rescoring I am using something along the lines of Librispeech pruned_transducer_stateless3 recipe with https://github.com/k2-fsa/icefall/tree/master/egs/ptb/LM as prototype for LM training (except 3 layers, 600 dim and tie-weights true. I get the following error messages: 2022-12-09 03:13:25,663 INFO [decode1.py:1185] lm filename: 2gram.fst.txt 2022-12-09 03:13:25,796 INFO [decode1.py:1191] num states: 453 2022-12-09 03:13:26,397 INFO [model.py:69] Tying weights 2022-12-09 03:13:26,397 INFO [checkpoint.py:112] Loading checkpoint from ../ngLM/rnnlm-exp/epoch-0.pt Traceback (most recent call last): File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1259, in main() File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/dsk1/icefall/egs/ng/./pruned_transducer_stateless3/decode1.py", line 1210, in main load_checkpoint( File "/home/ngoel/icefall/icefall/checkpoint.py", line 126, in load_checkpoint model.load_state_dict(checkpoint["model"], strict=strict) File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1490, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RnnLmModel: size mismatch for input_embedding.weight: copying a param with shape torch.Size([500, 600]) from checkpoint, the shape in current model is torch.Size([500, 2048]). size mismatch for rnn.weight_ih_l0: copying a param with shape torch.Size([2400, 600]) from checkpoint, the shape in current model is torch.Size([8192, 2048]).

It's not clear to me what this message means and how to fix this. Some guidance is appreciated.

marcoyang1998 commented 1 year ago

It seems to me that the model architecture does not match. You are using the You might need to change hidden-dim to 600.

csukuangfj commented 1 year ago

Can you check that you use the same RNN LM model parameters for both train.py and decode.py?

ngoel17 commented 1 year ago

Thanks. That resolved it.

ngoel17 commented 1 year ago

I ran into a different issue this time. May I get some guidance here on how to resolve the issue? This doesn't happen on the very first pass into the code, but it does happen for the first file. The bigram LM seems to load fine though.

File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1259, in main() File "/home/ngoel/anaconda3/envs/k2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 1235, in main results_dict = decode_dataset( File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 862, in decode_dataset hyps_dict = decode_one_batch( File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/decode1.py", line 712, in decode_one_batch hyp_tokens = modified_beam_search_rnnlm_LODR( File "/mnt/dsk1/icefall/egs/my/pruned_transducer_stateless3/beam_search.py", line 2304, in modified_beam_search_rnnlm_LODR assert current_ngram_score <= 0.0, ( AssertionError: (-inf, -inf)

csukuangfj commented 1 year ago

Could you please post the full command you are using to invoke decode.py?

ngoel17 commented 1 year ago

python3 -m pdb ./pruned_transducer_stateless3/decode1.py \ --iter 480000 \ --avg 20 \ --simulate-streaming 1 \ --causal-convolution 1 \ --decode-chunk-size 16 \ --left-context 64 \ --exp-dir ./pruned_transducer_stateless3/exp \ --max-duration 600 \ --beam 4.0 \ --max-contexts 8 \ --max-states 32 \ --decoding-method modified_beam_search_rnnlm_LODR \ --rnn-lm-scale 0.4 \ --rnn-lm-exp-dir ../LM/rnnlm-exp \ --rnn-lm-epoch 0 \ --rnn-lm-avg 1 \ --rnn-lm-num-layers 3 \ --rnn-lm-tie-weights 1 \ --tokens-ngram 2 \ --ngram-lm-scale -0.16 \

ngoel17 commented 1 year ago

I also notice something unusual about the 2gram.fst.txt. It ends with 452 282 461 461 11.9809 452 0 500 0 5.33528 452 0.328656

but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.

csukuangfj commented 1 year ago

but there is no state '0'. Transition to state "0" but state "0" is not defined. not even as a final state like 452 is.

452 0 500 0 5.33528

If 500 is the ID of the #0, then I think state 0 corresponds to the backoff state.

csukuangfj commented 1 year ago

assert current_ngram_score <= 0.0, ( AssertionError: (-inf, -inf)

Could you check that the default value 500 is also the ID of #0 in your tokens.txt? https://github.com/k2-fsa/icefall/blob/b25c234c51426d61552cdca819ab57fe712214c9/egs/librispeech/ASR/pruned_transducer_stateless3/decode.py#L460-L462

ngoel17 commented 1 year ago

Yes backoff-id is 500. This is the end of tokens.txt - "" insertd by me so that github displays the text properly. "I 496" "P 497" "* 498" "[ 499" "#0 500" "#1 501"

marcoyang1998 commented 1 year ago

I think there must be something wrong this your bigram. How was it generated?

Also, are you doing a cross-domain or intra-domain evaluation?

ngoel17 commented 1 year ago

Below is the LM generation script I used. In pruned_transducer_stateless3 a bunch of data for the 2nd output is cross-domain, but the primary is intra-domain. Evaluation is intra-domain, but previously unseen data.

!/usr/bin/env bash

lang_dir=data/lang_bpe_500

for ngram in 2 ; do if [ ! -f $lang_dir/${ngram}gram.arpa ]; then ./shared/make_kn_lm.py \ -ngram-order ${ngram} \ -text $lang_dir/transcript_tokens.txt \ -lm $lang_dir/${ngram}gram.arpa fi

if [ ! -f $lang_dir/${ngram}gram.fst.txt ]; then python3 -m kaldilm \ --read-symbol-table="$lang_dir/tokens.txt" \ --disambig-symbol='#0' \ --max-order=${ngram} \ $lang_dir/${ngram}gram.arpa > $lang_dir/${ngram}gram.fst.txt fi done

nshmyrev commented 1 year ago

Nevermind, I figured out, in order to avoid inf the LODR ngram should have <unk> and must have vocab of 500 tokens exactly

sangeet2020 commented 5 months ago

Hi @csukuangfj , I am also facing this issue, but havent been able to work it out and find a solution. Could you please let me know how should I solve it?

Error logs

2024-02-22 13:26:51,691 INFO [decode.py:834] Decoding started
2024-02-22 13:26:51,692 INFO [decode.py:840] Device: cuda:0
2024-02-22 13:26:51,696 INFO [decode.py:850] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.17.0.dev+git.230c8fcb.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'c78407a-dirty', 'icefall-git-date': 'Fri Feb 16 16:38:45 2024', 'icefall-path': '/mnt/local/sangeet/workncode/k2-fsa/icefall', 'k2-path': '/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/mnt/local/sangeet/workncode/lhotse/lhotse/__init__.py', 'hostname': 'emlgpu04', 'IP address': '127.0.1.1'}, 'epoch': 30, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal/1200'), 'bpe_model': 'Deu16_icefall/sample_data/lang_bpe_500/bpe.model', 'lang_dir': PosixPath('Deu16_icefall/sample_data/lm'), 'decoding_method': 'modified_beam_search_LODR', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': -0.24, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': True, 'lm_type': 'rnn', 'lm_scale': 0.42, 'tokens_ngram': 2, 'backoff_id': 500, 'context_score': 2.0, 'context_file': '', 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': False, 'manifest_dir': PosixPath('Deu16_icefall/sample_data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 19, 'lm_avg': 2, 'lm_exp_dir': '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800/', 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/1200/modified_beam_search_LODR'), 'has_contexts': False, 'suffix': 'epoch-30-avg-7-chunk-16-left-context-128-modified_beam_search_LODR-beam-size-4-rnn-lm-scale-0.42-LODR-2gram-scale--0.24-use-averaged-model', 'blank_id': 0, 'unk_id': 3, 'vocab_size': 500}
2024-02-22 13:26:51,696 INFO [decode.py:852] About to create model
2024-02-22 13:26:52,409 INFO [decode.py:919] Calculating the averaged model over epoch range from 23 (excluded) to 30
2024-02-22 13:26:56,392 INFO [model.py:75] Tying weights
2024-02-22 13:26:56,392 INFO [lm_wrapper.py:180] averaging ['/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-18.pt', '/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/LM/my-rnnlm-exp/1800//epoch-19.pt']
2024-02-22 13:26:58,886 INFO [decode.py:976] Loading token level lm: G_2_gram.fst.txt
2024-02-22 13:26:59,022 INFO [decode.py:982] num states: 12143
2024-02-22 13:26:59,027 INFO [decode.py:1018] Number of model parameters: 66110931
2024-02-22 13:26:59,027 INFO [asr_datamodule.py:409] About to get test cuts
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda-11.2/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
Traceback (most recent call last):
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1051, in <module>
    main()
  File "/mnt/users/sagarst/envs/k2-gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 1028, in main
    results_dict = decode_dataset(
                   ^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 680, in decode_dataset
    hyps_dict = decode_one_batch(
                ^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/./zipformer/decode.py", line 535, in decode_one_batch
    hyp_tokens = modified_beam_search_LODR(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/local/sangeet/workncode/k2-fsa/icefall/egs/Deu16/ASR/zipformer/beam_search.py", line 2623, in modified_beam_search_LODR
    assert current_ngram_score <= 0.0, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: (-inf, -inf)

I have made sure that my RNN-LM training arguments are same as arguments for decode.py with modified_beam_search_LODR. Also

wc -l Deu16_icefall/sample_data/lang_bpe_500/tokens.txt  >> 502

I am not sure how to make it 500.

and head and tail of tokens.txt looks like this

$$ head
<blk> 0
<sos/eos> 1
<UNK> 2
<unk> 3
▁ 4
S 5
T 6
EN 7
E 8
N 9

$$ tail
GRÖßTE 492
▁PAKISTAN 493
▁SEPTEMBER 494
▁STREIFEN 495
▁SCHWARZ 496
▁KÜNFTIG 497
▁STUTTGART 498
Q 499
#0 500
#1 501

If anyone help me solve it with some clue.

Thank You

csukuangfj commented 5 months ago

Xiaoyu, could you have a look?

sangeet2020 commented 5 months ago

I re-trained the RNN-LM and WER loks better. However they do not seem quite consistent.

WER with greedy search~ 9 WER with beam search~ 8.5

WER with modified beam search with Shallow Fusion and an external LM~ 8.6 WER with modified beam search with LODR (to counter ILM) and an external LM- ERROR WER with modified beam search with LM rescoring to re-rank the n-best hypotheses after beam search~ 8.8

@marcoyang1998 any clues why this could be happening? Also, any help on how could I fix the above error.

Thank You

sangeet2020 commented 3 months ago

Hello @marcoyang1998 , I was wondering if you had a chance to see the above error and point in some direction so that I can find out the reason and fix it.

marcoyang1998 commented 3 months ago

Sorry for getting back so late, I will have a look at the error this week.

Sangeet Sagar @.***>于2024年3月26日 周二22:00写道:

Hello @marcoyang1998 https://github.com/marcoyang1998 , I was wondering if you had a chance to see the above error and point in some direction so that I can find out the reason and fix it.

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/749#issuecomment-2020508925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK6YBCPTTLCALULNEF3KZJDY2FWQFAVCNFSM6AAAAAASY3RWFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGUYDQOJSGU . You are receiving this because you were mentioned.Message ID: @.***>

sangeet2020 commented 3 months ago

Hi @marcoyang1998 ,

Waiting for updated. Any clue should be fine for me to figure out what could be wrong.

duhtapioca commented 2 weeks ago

Facing the same error

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2620 # calculate the score of the latest token
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,
   2626 )
   2627 # score = score + TDLM_score - LODR_score
   2628 # LODR_LM_scale should be a negative number here
   2629 hyp_log_prob += (
   2630     lm_score[new_token] * lm_scale
   2631     + LODR_lm_scale * current_ngram_score
   2632     + context_score
   2633 )  # add the lm score

AssertionError: (-inf, -inf)

Head of the tokens.txt

<blk> 0
<sos/eos> 1
<unk> 2

Tail of the tokens.txt

tail -4 tokens.txt 
#0 500
#1 501
#2 502
#3 503

Any help regarding this?

danpovey commented 2 weeks ago

I'll try to find someone to look into this. Basically we need to trace back where the infinity came from and why. That may require adding assert statements to catch the infinity earlier. We should also find or decide where an infinity is "allowed" according to the intended interfaces used here.

csukuangfj commented 2 weeks ago

The latest code in the master is

https://github.com/k2-fsa/icefall/blob/2d64228efae6ebb85b68a956942374a4801548ae/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L2618-L2625

From your log

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2620 # calculate the score of the latest token
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,

Could you first try the latest master and see if the issue persists? @duhtapioca

sangeet2020 commented 2 weeks ago

I tried but ended up with the same error.

marcoyang1998 commented 2 weeks ago

@duhtapioca Could you please run your code again and print out the value of new_token when triggering this assertion?

duhtapioca commented 1 week ago

Could you please run your code again and print out the value of new_token when triggering this assertion?

Yes, will try that and share the output soon.

duhtapioca commented 1 week ago

@marcoyang1998

The output is now

************** The new token is - 98
************** The new token is - 94
************** The new token is - 308
************** The new token is - 280
************** The new token is - 95
************** The new token is - 60
************** The new token is - 233
************** The new token is - 19
************** The new token is - 23
************** The new token is - 103
************** The new token is - 207
************** The new token is - 8

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 predict_modified_beamsearch_LODR(['/home/azureuser/users/shreya/hindi_test_Set/hindi_test_main/test_main_wavs/3aa183cc-dd52-4822-9f2d-9ccaebafbac2.wav'])

File /anaconda/envs/k2_icefall/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

Cell In[5], line 227, in predict_modified_beamsearch_LODR(batch)
    217 context_graph.build(contexts)
    219 hyp_tokens = modified_beam_search(
    220         model=model,
    221         encoder_out=encoder_out,
   (...)
    224         context_graph=context_graph,
    225     )
--> 227 hyp_tokens = modified_beam_search_LODR(
    228         model=model,
    229         encoder_out=encoder_out,
    230         encoder_out_lens=encoder_out_lens,
    231         beam=params.beam_size,
    232         LODR_lm=ngram_lm,
    233         LODR_lm_scale=-0.24,
    234         LM=LM,
    235         context_graph=None,
    236     )
    237 print("Hyp tokens created")
    239 for hyp in sp.decode(hyp_tokens):

File /temp_ssd/icefall/egs/librispeech/ASR/zipformer/beam_search.py:2623, in modified_beam_search_LODR(model, encoder_out, encoder_out_lens, LODR_lm, LODR_lm_scale, LM, beam, context_graph)
   2621 current_ngram_score = state_cost.lm_score - hyp.state_cost.lm_score
   2622 print("************** The new token is - "+ str(new_token))
-> 2623 assert current_ngram_score <= 0.0, (
   2624     state_cost.lm_score,
   2625     hyp.state_cost.lm_score,
   2626 )
   2627 # score = score + TDLM_score - LODR_score
   2628 # LODR_LM_scale should be a negative number here
   2629 hyp_log_prob += (
   2630     lm_score[new_token] * lm_scale
   2631     + LODR_lm_scale * current_ngram_score
   2632     + context_score
   2633 )  # add the lm score

AssertionError: (-inf, -inf)

tokens.txt generated by egs/librispeech/ASR/local/prepare_lang_bpe.py for reference.