kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
421 stars 89 forks source link

Minor mistake in the way LM scores are computed #94

Closed yashjogi closed 1 year ago

yashjogi commented 1 year ago

The way LM score for a beam is computed inside the _get_lm_beams function, when the argument is_eos is set to True, might be erroneous.

Consider that _get_lm_beams function is called with is_eos argument set to False. Inside this function, consider that a beam with new_text equals to X is generated. In addition to this, consider that LM score of X is not present inside the cached_lm_scores dictionary. Hence, LM score for X is computed, and then cached in the cached_lm_scores dictionary.

Now, let's say _get_lm_beams function is called again in the last time-step, but now is_eos argument is set to True. Consider that a new beam is generated with new_text equals to X. Since the LM score of X is already cached inside the dictionary, the LM score of X would be retrieved using the cached_lm_scores dictionary. However, this score is incorrect, because in this case, we would want the LM score with is_eos flag set to True, but the retrieved LM score was computed with is_eos flag equal to False -- as explained in the above paragraph. Hence, the LM score retrieved for the new_text X is incorrect.

Is this correct?

lopez86 commented 1 year ago

I did a quick look through the code, and I think you're right. The cache key should probably also include is_eos to be sure things are consistent. I should be able to make a PR in the next couple days for this

yashjogi commented 1 year ago

Thanks!

lopez86 commented 1 year ago

This should be fixed with the latest version on github, there's a couple more issues to fix and then I should make a new release on pypi so it can be installed with pip

yashjogi commented 1 year ago

Thank you @lopez86 !