k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

Use tropical semiring for lm_paths.get_tot_scores #214

Closed csukuangfj closed 3 years ago

csukuangfj commented 3 years ago

See https://github.com/k2-fsa/snowfall/pull/201#discussion_r647975506

The 2nd arg to get_tot_scores() here, representing log_semiring, should be false, because ARPA-type language models are constructed in such a way that the backoff prob is included in the direct arc. I.e. we would be double-counting if we were to sum the probabilities of the non-backoff and backoff arcs.

Change log_semiring to tropical_semiring indeed improves the WER. For the test-clean dataset, when num_paths is 100 and lm_scale=1.2, the WER decreases from 6.06 to 5.98.