Open Thread7 opened 7 years ago
Anyone have a comment on this?
Live demo uses grammars. For continuous listening pocketsphinx requires keyword spotting mode.
@Thread7 I have a few procedures for coping with the background noise - but these are not issues with pocketsphinx.js - more the way I deal with it. Send me your contact details I will communicate by email.
@JohnAReid please email me at this address: thread7 AT gmail.com
Thanks a lot!
I am trying to write a command and control application using pocketsphinx.js inside a web worker with recognizer.js. I have successfully implemented keyword spotting for the commands with a switch to grammar for the remaining processing. The respective FSG grammar is written using state transitions as described in pocketsphinx.js README file. My problem is how to code the grammar to ignore out of grammar words and thus achieve accurate recognition. All solutions I found for pocketsphinx (as it still does not support the '
Rejecting words with grammars is indeed a difficult problem. You can try training filler words, if you go all the way to do acoustic model training, or you can add a phoneme loop to your grammar. A loop would just be a transition from one state to the same state.
As for using JSGF grammars, you can use them by loading them from a file, using LazyLoad for instance.
I have tried to implement a parallel garbage loop but I always get garbage as output (even though I speak the words "FIRST", "SECOND" in the example below, I still get a combination of G1, G2,... as grammar outputs). I used: grammarOptions = {numStates: 2, start: 0, end: 1, transitions: [{from: 0, to: 1, logp: 0, word: "FIRST"}, {from: 0, to: 1, logp: 0, word: "SECOND"}, {from: 0, to: 1, logp: -5, word: "G1"}, {from: 0, to: 1, logp: -5, word: "G2"}, ... repeat for remaining phonemes
{from: 0, to: 0, logp: -5, word: "G1"},
{from: 0, to: 0, logp: -5, word: "G2"},
... repeat for remaining phonemes
For the garbage phoneme transitions (both to state 1 and remaining in state 0) I used the same logp and tried values -5, -10, -20, all with the same negative result. What am I missing please?
Try -2000, sometimes it should choose a proper variant. Such experiments are easier to conduct with pocketsphinx desktop version, not with js.
@nshmyrev Many thanks for your input. I tried -2000 but unfortunately things did not change.
As pocketsphinx still does not support the unknown word
You can share a pocketsphinx_continuous example (not js) with audio file and grammar and I'll take a look.
Hi, I have produced two pocketsphinx_contnuous examples using the commands: (1) pocketsphinx_continuous -dict /share/keyphrase.dict -fsg /share/balanceGarbageLoop.fsg -inmic yes -infile /share/allNoise.wav (an audio file with only garbage noise outside the allowed grammar) (2) pocketsphinx_continuous -dict /share/keyphrase.dict -fsg /share/balanceGarbageLoop.fsg -inmic yes -infile /share/accountNoise.wav (an audio file with garbage noise in the middle of allowed grammar words) The grammar used contains a garbage loop as outlined in my previous append. In both examples, the garbage noise is recognised as valid grammar words ("CURRENT") Also in example (2), the first valid word "CURRENT" is systematically not recognised (over many runs) and instead pocketsphinx_continuous responds with the following error: ERROR: "fsg_search.c", line 940: Final result does not match the grammar in frame 115 Any idea regarding the above error would also be very welcome.
The audio files and grammar used for the tests are at: https://www.dropbox.com/s/5mveyre9lnajdnp/ProblemDocumentation.zip?dl=0
Hi, any news regarding the above? Many thanks
Your audio is clipped, you simply need to reduce the recording level.
@nshmyrev Many thanks for pointing the issue with recording levels I did some more tests with the garbage loop. I used a "jsgf" form of the grammar (included below) and I picked the weights between the garbage loop and the valid grammar words so that there is maximum distance between them (this is only one of the many tests I did). Using a sound file of eight valid 8 words and a simple grammar without the garbage loop and weights, pocketsphinx_continuous was able to provide excellent recognition results for all valid words over many runs. However, adding back the garbage loop, and despite the selected largest distance between the weights, pocketsphinx_continuous was recognizing garbage instead of the valid words in the vast majority of cases as shown below after the grammar. All files (sound, grammars) exist at: https://www.dropbox.com/s/2txxo0ep98odmhk/ProblemDocumentation.zip?dl=0
Getting a garbage loop to work is proving to be a very challenging problem. Are there any news regarding proper support of the UNKNOWN word by Pocketsphinx (I could see that the development team was working on it some time back)? Many thanks for all your support on this!
Grammar (with garbage loop) used:
grammar balance;
public
Hi, any news regarding the above? Many thanks
Well, ideally one would rewrite the decoder to include the loop like we have in kws search ;) Give me some more time please.
@nshmyrev Hi Nickolay, any news? Your last append sounded very promising. It would be great if we could have grammar support in Pocketsphinx with embedded the garbage loop in the decoder. Many thanks
I agree it would be great to have this working.
I was wondering whether there has been any progress on this. Many thanks.
I am experiencing the same issue as everybody else.
A thought - it might be helpful to run some kind of volume change detection and only feed audio into pocketsphinx while the volume is measurably changing. So, if the audio goes flat for a period of time (a second or two?), stop actively analyzing it.
This seems to be a duplicate of https://github.com/syl22-00/pocketsphinx.js/issues/60
I have tested PocketSphinx.js against a compiled Android app using the same us-ptm models. Performance was equal on both platforms. Except for one annoying difference on PocketSphinx.js. It often recognizes background noise as words. I guess it is background noise anyways. I can leave the webapp/live.html page up for 5 minutes in a fairly quiet room and not say a word. Yet it will still think I said 5 or 10 of the city words in the demo. I think many of you have experienced this in the demo as I see it on all 5 devices I have tried. I have tried different acoustic models too. Any way to change this sensitivity?