syl22-00 / pocketsphinx.js

Speech recognition in JavaScript and WebAssembly
1.49k stars 261 forks source link

Recognizing Background Noise #89

Open Thread7 opened 7 years ago

Thread7 commented 7 years ago

I have tested PocketSphinx.js against a compiled Android app using the same us-ptm models. Performance was equal on both platforms. Except for one annoying difference on PocketSphinx.js. It often recognizes background noise as words. I guess it is background noise anyways. I can leave the webapp/live.html page up for 5 minutes in a fairly quiet room and not say a word. Yet it will still think I said 5 or 10 of the city words in the demo. I think many of you have experienced this in the demo as I see it on all 5 devices I have tried. I have tried different acoustic models too. Any way to change this sensitivity?

Thread7 commented 7 years ago

Anyone have a comment on this?

nshmyrev commented 7 years ago

Live demo uses grammars. For continuous listening pocketsphinx requires keyword spotting mode.

6gsaifulislam commented 7 years ago

@Thread7 I have a few procedures for coping with the background noise - but these are not issues with pocketsphinx.js - more the way I deal with it. Send me your contact details I will communicate by email.

Thread7 commented 7 years ago

@JohnAReid please email me at this address: thread7 AT gmail.com

Thanks a lot!

karamanos commented 7 years ago

I am trying to write a command and control application using pocketsphinx.js inside a web worker with recognizer.js. I have successfully implemented keyword spotting for the commands with a switch to grammar for the remaining processing. The respective FSG grammar is written using state transitions as described in pocketsphinx.js README file. My problem is how to code the grammar to ignore out of grammar words and thus achieve accurate recognition. All solutions I found for pocketsphinx (as it still does not support the '' word) use a parallel garbage loop containing all base phonemes. Furthermore, in all cases the grammar syntax is a string starting with "#JSGF V1.0; ..." and not a set of state transitions. I tried using this syntax in the recognizer.js "addGramar" command but I got an error (looking at the recognizer.js code it seems that its expects the grammar in the state transition form). What is the best way to solve this problem in pocketsphinx.js?

syl22-00 commented 7 years ago

Rejecting words with grammars is indeed a difficult problem. You can try training filler words, if you go all the way to do acoustic model training, or you can add a phoneme loop to your grammar. A loop would just be a transition from one state to the same state.

As for using JSGF grammars, you can use them by loading them from a file, using LazyLoad for instance.

karamanos commented 7 years ago

I have tried to implement a parallel garbage loop but I always get garbage as output (even though I speak the words "FIRST", "SECOND" in the example below, I still get a combination of G1, G2,... as grammar outputs). I used: grammarOptions = {numStates: 2, start: 0, end: 1, transitions: [{from: 0, to: 1, logp: 0, word: "FIRST"}, {from: 0, to: 1, logp: 0, word: "SECOND"}, {from: 0, to: 1, logp: -5, word: "G1"}, {from: 0, to: 1, logp: -5, word: "G2"}, ... repeat for remaining phonemes

                        {from: 0, to: 0, logp: -5, word: "G1"}, 
                        {from: 0, to: 0, logp: -5, word: "G2"}, 
                                ... repeat for remaining phonemes

For the garbage phoneme transitions (both to state 1 and remaining in state 0) I used the same logp and tried values -5, -10, -20, all with the same negative result. What am I missing please?

nshmyrev commented 7 years ago

Try -2000, sometimes it should choose a proper variant. Such experiments are easier to conduct with pocketsphinx desktop version, not with js.

karamanos commented 7 years ago

@nshmyrev Many thanks for your input. I tried -2000 but unfortunately things did not change. As pocketsphinx still does not support the unknown word , I would think that any real implementation of pocketsphinx / pocketsphinx.js based on grammars will need a garbage loop. The coding of such loop I thus assume should be rather standard. Is the logic I used above regarding the needed garbage loop states and transitions valid? Regarding coding the transition probabilities, the pocketsphinx.js documentation states that these are in log-probability form, hopefully this holds.

nshmyrev commented 7 years ago

You can share a pocketsphinx_continuous example (not js) with audio file and grammar and I'll take a look.

karamanos commented 7 years ago

Hi, I have produced two pocketsphinx_contnuous examples using the commands: (1) pocketsphinx_continuous -dict /share/keyphrase.dict -fsg /share/balanceGarbageLoop.fsg -inmic yes -infile /share/allNoise.wav (an audio file with only garbage noise outside the allowed grammar) (2) pocketsphinx_continuous -dict /share/keyphrase.dict -fsg /share/balanceGarbageLoop.fsg -inmic yes -infile /share/accountNoise.wav (an audio file with garbage noise in the middle of allowed grammar words) The grammar used contains a garbage loop as outlined in my previous append. In both examples, the garbage noise is recognised as valid grammar words ("CURRENT") Also in example (2), the first valid word "CURRENT" is systematically not recognised (over many runs) and instead pocketsphinx_continuous responds with the following error: ERROR: "fsg_search.c", line 940: Final result does not match the grammar in frame 115 Any idea regarding the above error would also be very welcome.

The audio files and grammar used for the tests are at: https://www.dropbox.com/s/5mveyre9lnajdnp/ProblemDocumentation.zip?dl=0

karamanos commented 7 years ago

Hi, any news regarding the above? Many thanks

nshmyrev commented 7 years ago

Your audio is clipped, you simply need to reduce the recording level.

karamanos commented 7 years ago

@nshmyrev Many thanks for pointing the issue with recording levels I did some more tests with the garbage loop. I used a "jsgf" form of the grammar (included below) and I picked the weights between the garbage loop and the valid grammar words so that there is maximum distance between them (this is only one of the many tests I did). Using a sound file of eight valid 8 words and a simple grammar without the garbage loop and weights, pocketsphinx_continuous was able to provide excellent recognition results for all valid words over many runs. However, adding back the garbage loop, and despite the selected largest distance between the weights, pocketsphinx_continuous was recognizing garbage instead of the valid words in the vast majority of cases as shown below after the grammar. All files (sound, grammars) exist at: https://www.dropbox.com/s/2txxo0ep98odmhk/ProblemDocumentation.zip?dl=0

Getting a garbage loop to work is proving to be a very challenging problem. Are there any news regarding proper support of the UNKNOWN word by Pocketsphinx (I could see that the development team was working on it some time back)? Many thanks for all your support on this!

Grammar (with garbage loop) used:

JSGF V1.0;

grammar balance; public = /0.0000000000000000000000000000000000000000001/ | /10000000000000000000000000000000000/ ;

= CURRENT | SAVINGS | VISA | MASTERCARD; = (G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | G11 | G12 | G13 | G14 | G15 | G16 | G17 | G18 | G19 | G20 | G21 | G22 | G23 | G24 | G25 | G26 | G27 | G28 | G29 | G30 | G31 | G32 | G33 | G34 | G35 | G36 | G37 | G38 | G39)+ ; Pocketsphinx_continuous recognition results with the garbage loop added (input sound file has 8 valid words): G20 G2 G11 G31 G12 G23 G9 G20 G32 G6 G33 G28 G34 G23 G32 G31 G39 G38 G13 G37 G18 G23 G15 G38 G27 G32 G30 G39 G13 G18 G13 G37 G29 G7 G20 G32 G3 G7 G7 G22 VISA G33 G7 G18 G29 G11 G3 G24 G3 G3 G20 G36 G34 G36 G1 G2 G29 G33 G3 G11 G16 G1 G31 G9 G7 G33 G7 G3 G29 G16 G17 G3 G15 G3 G15 G6 G3 G27 G31 G10
karamanos commented 7 years ago

Hi, any news regarding the above? Many thanks

nshmyrev commented 7 years ago

Well, ideally one would rewrite the decoder to include the loop like we have in kws search ;) Give me some more time please.

karamanos commented 7 years ago

@nshmyrev Hi Nickolay, any news? Your last append sounded very promising. It would be great if we could have grammar support in Pocketsphinx with embedded the garbage loop in the decoder. Many thanks

Thread7 commented 7 years ago

I agree it would be great to have this working.

karamanos commented 7 years ago

I was wondering whether there has been any progress on this. Many thanks.

skibulk commented 3 years ago

I am experiencing the same issue as everybody else.

A thought - it might be helpful to run some kind of volume change detection and only feed audio into pocketsphinx while the volume is measurably changing. So, if the audio goes flat for a period of time (a second or two?), stop actively analyzing it.

This seems to be a duplicate of https://github.com/syl22-00/pocketsphinx.js/issues/60