Consistent high latency of 21secs on first query when streaming audio to detect intent

shekit commented 5 years ago

Environment details

OS: Raspbian Stretch - Raspberry Pi 3B+
Node.js version: 8.15.0
npm version: 5.6.0
dialogflow version: 0.8.0

Steps to reproduce

I have dialogflow setup on a raspberry pi for a voice assistant and while the entire system is working I am facing an unusual problem.

When I initially boot up the system and send an audio stream from the mic to dialogflow for real time processing it returns the result after ~21 seconds. The subsequent audio requests sent are returning correctly within 1-2 seconds. This behaviour also reappears after a period of inactivity i.e if the system is already running and I don't make a voice request for ~3-5 mins.

Update: Also the delay on the first response is consistently reproduced down to the millisecond almost! I added some time logging to ascertain this. The image shows that the first response(red arrows) consistently takes 21.xx secs to return. Subsequent responses are in the 1-2 sec range.

screen shot 2019-02-26 at 12 07 49 am

I forcibly close the stream after 4000ms. What is interesting is that during my logging, when the stream is forcibly unpiped, the unpipe event is triggered but the finish event is not immediately triggered even though .end() is correctly called on the stream within the unpipe event handler. This only happens on the first request or after a period of inactivity and the finish event only gets triggered once dialogflow sends the result back after ~21 seconds. It's as if the stream is forcibly being kept open and calling .end() on it has no effect whatsoever.

Here is my code:

const record = require('node-record-lpcm16')
const path = require('path')

const os = require('os')
const config = require(path.join(process.cwd(),"app","config","config"))

var FileWriter = require('wav').FileWriter

// SNOWBOY WAKEWORD DETECTOR
const Detector = require('snowboy').Detector;
const Models = require('snowboy').Models;

const through2 = require('through2')

const models = new Models();

var fs = require('fs')

models.add({
    file: path.join(process.cwd(), 'app','config', config.speech.model),
    sensitivity: config.speech.sensitivity,
    hotwords: 'peeqo'
})

const snowboyDetector = new Detector({
    resource: path.join(process.cwd(), 'app','config', 'common.res'),
    models: models,
    audioGain: 2.0
})

const snowboyOn = true
// NODE RECORD

var recorder = (os.arch()=='arm')?'arecord':'rec'

const opts = {
    verbose: false,
    threshold: 0,
    recordProgram: recorder,
    sampleRateHertz: 16000
}

const dialogflow = require('dialogflow')

const speech = new dialogflow.SessionsClient({
    projectId: config.speech.projectId,
    keyFilename: path.join(process.cwd(), 'app','config', config.speech.dialogflowKey)
})

const sessionPath = speech.sessionPath(config.speech.projectId, "test-app")

const request = {
    session: sessionPath,
    queryParams: {
        session: speech.sessionPath(config.speech.projectId, "test-app")
    },
    queryInput:{
        audioConfig:{
            audioEncoding: "AUDIO_ENCODING_LINEAR_16",
            sampleRateHertz: 16000,
            languageCode: 'en-US'
        }
    },
    singleUtterance: true,
    interimResults: true
}

// DIALOGFLOW CLASS
class GoogleSpeech {
    constructor(request){
        this.request = request
        this.stream = speech.streamingDetectIntent()
        this.result = ''
        this.unpipeTimeout = null
        this.listenFor = 4000 //cut off stream after this duration
        this.stream.write(this.request) 
    }

    startStream(){
        var self = this

        this.stream.on('pipe', function(){
            console.log('PIPING > GOOGLE')

            self.unpipeTimeout = setTimeout(()=>{
                // cut off listening since time limit exceeded
                console.log("UNPIPING GOOGLE MYSELF")
                self.unpipeTimeout = null
                tstream.unpipe(self.stream)
                mic.unpipe(tstream)
            }, self.listenFor)
        })

        this.stream.on('error', function(err){
            console.error('ERROR > GOOGLE', err)
        })

        this.stream.on('close', function(){
            console.log('GOOGLE PIPE > CLOSED')
        })

        this.stream.on('data', function(data){

            if(data.recognitionResult){
                console.log(`Interim result: ${data.recognitionResult.transcript}`)
            }

            if(data.queryResult != null){
                self.result = data.queryResult.queryText
                tstream.unpipe(self.stream)
                mic.unpipe(tstream)
            }

        })

        this.stream.on('unpipe', function(src){
            console.log('UNPIPING > GOOGLE')
            tstream.end()
            self.stream.end()
        })

        this.stream.on('finish', function(){
            console.log('FINISHED > GOOGLE')

            if(self.unpipeTimeout != null){
                // clear timer if it is running but result has returned already
                clearTimeout(self.unpipeTimeout)
                self.unpipeTimeout = null
                console.log("CLEARING TIMEOUT > RESULT RETURNED")
            }

            if(self.result){
                console.log(`RESULT: ${self.result}`)
            }

            // pipe back to snowboy for hotword detection
            mic.pipe(snowboyDetector)
        })

        var tstream = through2.obj((obj,_,next) => {
            next(null, {inputAudio: obj})
        })

        // pipe to dialogflow
        mic
        .pipe(tstream)
        .pipe(this.stream)

    }

}

// SNOWBOY EVENTS
snowboyDetector.on('unpipe', function(src){
    console.log('STOPPED PIPING > SNOWBOY')
})

snowboyDetector.on('pipe', function(src){
    console.log('PIPING > SNOWBOY')
})

// EVENT ON WAKEWORD DETECTION
snowboyDetector.on('hotword', function(index, hotword){

    // unpipe stream from hotword detection
    // and pipe to dialogflow

    mic.unpipe(snowboyDetector)

    // pass in request config
    const gNew = new GoogleSpeech(request)
    gNew.startStream()
})

const mic = record.start(opts)

// START LISTENING FOR HOTWORD
mic.pipe(snowboyDetector)

Do note: This behaviour is more pronounced on the pi. The 21 second delay seems to be happening on the Pi. While the same behaviour is exhibited on my mac, the delay in the response after the first query is much less (~7 secs). There are no network connectivity issues, because the result is always returned, just after a huge gap. I've also tried with a different internet connection. This issue also happens when I just use google cloud speech.

To check if this was some kind of issue with streams, I created a dummy stream in which I simply output the mic recording to a .wav file. This worked fine and this time the .end() is called immediately & correctly from within the unpipe event. I went one step further and streamed this .wav file to dialogflow instead of from the mic and then the behaviour returned! 21.xx seconds!!!

I'm also attaching a gif of the logging. If you watch the entire thing (there's a long gap in the middle so keep watching :)) you'll see the issue I'm talking about. As you'll see the first result takes a long time to return whereas the subsequent queries returns instantly.

dialogflow

Any help would be appreciated! This is driving me crazy!

MathiasSchrooten commented 5 years ago

@shekit were you able to figure this out?

bcoe commented 4 years ago

@shekit we've made a few fixes around authentication, I'm also curious if you're continuing to bump into these issues.

callmehiphop commented 4 years ago

Seems this issue has gone silent so I'm going to close it out. Please let us know if you think this is a mistake and we'll be happy to re-open.

googleapis / nodejs-dialogflow

Consistent high latency of 21secs on first query when streaming audio to detect intent #291

Environment details

Steps to reproduce