k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
928 stars 295 forks source link

CTC HLG decoding high deletion, tune word insertion penalty #889

Open ronggong opened 1 year ago

ronggong commented 1 year ago

Hi,

My CTC HLG decoding without nbest or whole lattice rescoring gives a quite high deletion. I would like to tune it to balance it with the insertion. Do we have a parameter like word insertion penalty for the lattice?

Thanks.

csukuangfj commented 1 year ago

Does decoding with H have the same issue?

ronggong commented 1 year ago

H decoding has an WER worse than HLG. INS DEL H: 2.30 5.06 HLG: 1.94 5.51

HLG is with lm score = 0.1. Larger lm score gives larger DEL. Basically, HLG increased DEL.

csukuangfj commented 1 year ago

Could you tune https://github.com/k2-fsa/icefall/blob/af735eb75bf55e6b1e41602105a6f939aedbaf5c/egs/librispeech/ASR/conformer_ctc3/decode.py#L245

ronggong commented 1 year ago

lm score = 0.1 HLG score INS DEL 0.2 1.84 4.69 0.4 1.81 4.76 0.6 1.77 4.86 0.8 1.73 5.07 1.0 1.94 5.51

Use small HLG score gives better DEL. Why this happens? Can we make INS and DEL more balance? Maybe by tuning the penalty?