Open ankitmundada opened 6 years ago
You can get rid of the if statement here https://github.com/parlance/ctcdecode/blob/cef6739f7370762229cf7e115e4afcc319a4f805/ctcdecode/src/scorer.cpp#L83 This would assign the <UNK>
probability to the OOV words.
This line gives a score of
-1000
(which is declared here), to any n-gram which contains anOOV
. Is this the right way to approach it? Isn't it possible to get the score for<unk>
tokens from the LM and use that instead of using a hardcoded score?