Poor performance - Russian obscene STT

snakers4 commented 5 years ago

Hi!

Many thanks for your amazing easy to use STT product! I have yet to learn how to use your text models, but STT seems to work out-of-the-box really fine.

My language in Russian, and you may know that it features a great deal of obscene words, that people commonly use in some contexts.

In our use-case we have to recognize these words as well as ordinary words. Looks like your language model on top of acoustic model does not know them. We could add our own language model, but in this case we would need raw acoustic model outputs.

Is is somehow possible with the current API? Looks like the pywit it just a requests wrapper and 99% of work is done on server-side.

patapizza commented 5 years ago

Hi @snakers4,

Thank you for the kind words.

Indeed, pywit is just a thin wrapper of our HTTP API. For tracking service-related questions and issues, we use https://github.com/wit-ai/wit/issues.

Personalized language models is something we want to support down the road. I'll share your input with the team. In the meantime, you can use the voice inbox to correct the transcripts.

snakers4 commented 5 years ago

I'll share your input with the team

Many thanks!

Turns our there are much simpler ways to check data at scale:

Check via calculating WER against another source of annotation;
Check the number of words / number of letters vs. duration of the clips - there should be direct correlation, if there is none, then STT quality is low;
Truncate clips that have less than 2 words or 10 symbols;
Truncate clips that have special symbols, latin symbols, etc;

A combination of these basically allows to build fast heuristics to take only the most relevant texts.

wit-ai / pywit

Poor performance - Russian obscene STT #123