kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
428 stars 90 forks source link

How Many HotWords #61

Open finardi opened 2 years ago

finardi commented 2 years ago

Hello! There is a golden interval for how many hotwords can I pass to the decode? 1000 is too much? I've fine-tuned a model in Portuguese language and I have a specific vocabulary with bank/finance context.

lopez86 commented 2 years ago

have you tested your performance on a test set? I'm not sure exactly at what point the number of hotwords will start degrading performance but 1000 sounds like a lot. at some point it would probably be better just to retrain the language model with an upgraded vocabulary and keep hotwords for a small number of more targeted words