syl22-00 / pocketsphinx.js

Speech recognition in JavaScript and WebAssembly
1.49k stars 262 forks source link

False detection when there is no audio #60

Open shekit opened 8 years ago

shekit commented 8 years ago

Hi,

I am using pocketsphinx and it keeps detecting words which are not being spoken. This is visible in the live demo on the website as well. Even in a silent room, it detects and prints out the keywords repeatedly. Is there any way to prevent this? I basically need it to detect a single word and have built my keyword list consisting of the single keyword. However once I press start, it starts printing it out almost continuously even though nothing has been said. It's basically detecting almost any sound as the keyword.

I have edited this in my live_kws.html file to detect a single word.

var wordList = [["PICO", "P IY K OW"]];
var keywords = [{title:"PICO", g:"PICO"}];
syl22-00 commented 8 years ago

You would get more and better answers by reaching out to the pocketsphinx community directly (http://cmusphinx.sourceforge.net/).

For keyword spotting, there are a few parameters you can play with:

-keyphrase              Keyphrase to spot
-kws                    A file with keyphrases to spot, one per line
-kws_delay      10      Delay to wait for best detection score
-kws_plp        1e-1        Phone loop probability for keyword spotting
-kws_threshold      1       Threshold for p(hyp)/p(alternatives) ratio
justinoverton commented 8 years ago

I experienced this too. I updated the audioRecord.js file to also output whatever is getting passed to sphinx to the speakers. See below:

    var jolisten = new (window.AudioContext || window.webkitAudioContext)();
    var jobuf = jolisten.createBuffer(1, outputBufferLength, (config.outputSampleRate || 16000));

    worker.onmessage = function(e) {
        if (e.data.error && (e.data.error == "silent")) errorCallback("silent");
        if ((e.data.command == 'newBuffer') && recording) {
        myClosure.consumers.forEach(function(consumer, y, z) {
                    consumer.postMessage({ command: 'process', data: e.data.data });
        });

        //S remove this

        var nowbuf = jobuf.getChannelData(0);
        for(var i=0; i<e.data.data.length; i++) {
            var k = e.data.data[i];
            //This supposedly converts it back to float, but it doesn't matter if you do it or not for the playback
            var f = (k >= 0x8000) ? -(0x10000 - k) / 0x8000 : k / 0x7FFF;
            nowbuf[i] = k;
        }

        var josrc = jolisten.createBufferSource();
        josrc.buffer = jobuf;
        josrc.connect(jolisten.destination);
        josrc.start();
        //E remove this

        }
    };

After much experimentation I've discovered that a part of the conversion from microphone's higher sampling rate to the 16000hz is partly to blame. Specifically the part that converts the Float32 from javascript to the Int16 that sphinx wants:

It looks like this in audioRecorderWorker.js in method record():

    for (var i = 0 ; i < inputBuffer[0].length ; i++) {
    recBuffers.push((inputBuffer[0][i] + inputBuffer[1][i]) * 16383.0);
    }

Basically there's a bunch of loud white-noise in the audio that's getting passed to sphinx. I don't know enough about audio yet to know exactly what to do, but I think maybe a highpass and/or lowpass filter might help.

FYI: If you use the snippet to hear what's coming out of the microphone you need to use headphones. The reverb will be deafening otherwise.

justinoverton commented 8 years ago

I created a pull request that graphs the wave form and enables the ability to listen to what is passed to sphinx.

justinoverton commented 8 years ago

I have determined that a lowpass filter of 800hz and a highpass filter of 50hz does reduce some of the background noise. However sphinx is still recognizing random words even when there is no speech. When there is speech it recognizes whatever it wants to. It doesn't matter if it's in the normal mode or the keyword spotting mode.

I've tried adjusting the operating system's output levels for the mic, but that doesn't help either. I've tried using the cmusphinx acoustic model, lm, and dict but it doesn't help either.

I'm at a loss for what to do next.

nshmyrev commented 8 years ago

Justin, cmusphinx uses a bandwidth between 100 and 6800 Hz, it also tries to repair from filters but overall any signal processing is usually harmful for accuracy.

To debug pocketsphinx keyword spotting the tutorial recommends you to record a file and play with pocketsphinx_continuous on desktop to get a reliable recognition. You need to select a keyphrase of 3-4 syllables for reliable detection and you need to configure the threshold appropriately. You can share the recorded file if you have troubles.

Once you have a reliable detection in command line, you can proceed with the javascript version.

justinoverton commented 8 years ago

Nickolay,

Thanks for the info. I thought about the keyphrase threshold, but I'm experiencing the issue without keywords as well. The issue is occurring on the examples for this project. Is anyone able to confirm that the default example "live.html" works as expected on a specific machine?

Tonight I'll play around with some tuning parameters on the command line.

It would be nice if there was a known working model, lm, etc and the cmu args that would enable a dev to test the feasibility of sphinx for his/her project prior to investing a lot of time into building and tuning a grammar/lm/etc.

justgeek commented 8 years ago

Decreasing microphone boost from my windows control panel, so I think this is definitely noise issue that is being processed, but the question is how can you process noise to recognized words, is not there a confidence factor ?

nshmyrev commented 8 years ago

@justgeek You need to provide more details - configuration, keywords, thresholds, audio data in order to get help with detection. It is better to ask that on cmusphinx forum, not here.

seekM commented 8 years ago

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.

justinoverton commented 8 years ago

@seekM I think it may be an issue where training the model could help. I'm not working on this at the moment though.

Sent from my iPhone

On Aug 29, 2016, at 3:53 AM, seekM notifications@github.com wrote:

@justinoverton I'd be interested to know if you could make progress and maybe share your insights.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jenweber commented 7 years ago

For anyone experiencing many false positives/low detection accuracy with keyword search, try grabbing a fresh copy of the minified pocketsphinx file in .pocketsphinx.js/webapp/js/pocketsphinx.js. I believe that older versions were missing essential components for the keyword search detection threshold variable to work ( -kws_threshold ). As of commit id 67cf7221fde457a0c99ead49394069d039effe11 and adjusting the variable syntax to something like "1e-35" instead of whole numbers, hotword detection was working great for me with very few false positives. When I was using an older copy of the file, I had poor hotword detection yet it would randomly "hear" the hotword in just about any sound.

Check your console for these errors to confirm if your issue is the same as mine. Filter logs by "kws". This is the sign that you need to grab a new file:

ERROR: "cmd_ln.c", line 938: Unknown argument: -kws_threshold

And a closer look may show: INFO: kws_search.c(405): KWS(beam: -1080, plp: -23, default threshold -524288, delay 10)

A threshold of -524288 would be unbelievably permissive, allowing just about any random noises o be interpreted as the keyword. The useful range of variables appears to be something between "1e-50" which is permissive, through "1e-0" which would be very strict. The documentation about this feature on the CMUSphinx site itself is very poor so I just had to play around with it.