wit-ai / pywit

Python library for Wit.ai
Other
1.45k stars 359 forks source link

Poor performance - Russian obscene STT #123

Open snakers4 opened 5 years ago

snakers4 commented 5 years ago

Hi!

Many thanks for your amazing easy to use STT product! I have yet to learn how to use your text models, but STT seems to work out-of-the-box really fine.

My language in Russian, and you may know that it features a great deal of obscene words, that people commonly use in some contexts.

In our use-case we have to recognize these words as well as ordinary words. Looks like your language model on top of acoustic model does not know them. We could add our own language model, but in this case we would need raw acoustic model outputs.

Is is somehow possible with the current API? Looks like the pywit it just a requests wrapper and 99% of work is done on server-side.

patapizza commented 5 years ago

Hi @snakers4,

Thank you for the kind words.

Indeed, pywit is just a thin wrapper of our HTTP API. For tracking service-related questions and issues, we use https://github.com/wit-ai/wit/issues.

Personalized language models is something we want to support down the road. I'll share your input with the team. In the meantime, you can use the voice inbox to correct the transcripts.

snakers4 commented 5 years ago

I'll share your input with the team

Many thanks!

Turns our there are much simpler ways to check data at scale:

A combination of these basically allows to build fast heuristics to take only the most relevant texts.