syl22-00 / pocketsphinx.js

Speech recognition in JavaScript and WebAssembly
1.49k stars 261 forks source link

Sentence matching #52

Open resle opened 8 years ago

resle commented 8 years ago

Sorry to go slightly off topic, this is more a call for best practice than an issue with code.

My usage of pocketsphinx is a bit unorthodox: I need to use the recognizer to match a known sentence to the user's pronunciation of that sentence. Ex. : "Hello how are you?".

It's, essentially, a CNC speech recognition case with only one command, the command being a single well defined sentence. So, in a sense, both using a grammar - and real time dictation with continuous recognition, is not exactly convenient.

I attempted three workarounds:

or

or

Is there any better way that I am entirely missing, and perhaps a way of returning a confidence value for each word or hypothesis?

Thanks a.

ps. the main problem, though, seems to be that pretty much everything is recognized as a word or even the whole sentence I have described in the grammar. Noise, random mumbling. I expected the problems with recognition to be at the other end of the spectrum (missed recognition of something which is actually clearly and loudly pronounced). Is there anything I should tune to make recognition way, way more strict?

syl22-00 commented 8 years ago

Not sure exactly what you want. If you want to recognize that sentence with possible missing words, you can build a grammar as you did in your example and add epsilon-transitions (transition with no word emitted). If you want to detect if words are mispronunced or replaced by ther words, you'd need to train some fillers. You should find more on these on sphinx website, nothing here is specific to pocketsphinx.js. http://cmusphinx.sourceforge.net/

Alternatively, you can look at ispikit, which does pronunciation assessment and flags missing and mispronounced words.

resle commented 8 years ago

Thanks.

What am trying to do is check whether a user has spoken all the words in a given sentence, with reasonable "understandability"

Once understood how to build grammars with epsilons, the remaining issue here is the recognition's "confidence threshold" so to say. I have tested CMU Sphinx and it seems to be way more strict, which is exactly what I need. Is this, perhaps, a parameter that needs to be set somewhere for PocketSphinxJS ?

I must be doing something wrong because, as much as I know that filters should be trained etc., I am experiencing that even leaving the microphone pick up random noise, always results in all the words being correctly recognized one after another.

nshmyrev commented 8 years ago

You need to use keyword spotting mode, that's all.