Closed DRosemei closed 4 years ago
I think its not possible ...
beach search cant always predict punctuations . punctuation is also like one of charector
say ..i have "Oh !" Beam search predicts it Oh & ! and feed to your language model for finding probability of (Oh,!) your language model finds prob and decodes it
if you dont have "!" in your language model ..it is impossibe for you to predit
beam search may or may not predict punctuvations take a example word "late" it can decode "iate" if it can mistake " l " with "i" why cant it mistake " ! " with " l "
both are same for CTC decode
I think its not possible ... beach search cant always predict punctuations . punctuation is also like one of charector say ..i have "Oh !" Beam search predicts it Oh & ! and feed to your language model for finding probability of (Oh,!) your language model finds prob and decodes it
if you dont have "!" in your language model ..it is impossibe for you to predit
beam search may or may not predict punctuvations take a example word "late" it can decode "iate" if it can mistake " l " with "i" why cant it mistake " ! " with " l "
both are same for CTC decode
You are right. There is a dictionary in the code, so it could not predict punctuations with a language model without punctuations.
Thanks for your great work! I have trained an english language model without punctuations using kenlm, and ctcdecode always outputs strings without punctuations? I also have trained an english language model with punctuations and ctcdecode could output punctuations. In my opinion, beam search will always predict punctuations, language model just gives a correction. I also find that ngrams are the same before language model scoring when using english language model with and without punctuations in ctcdecode/ctcdecode/src/ctc_beam_search_decoder.cpp. So I want to know how should I modify the code to get punctuations with language model without punctuations?