wit-ai / wit

Natural Language Interface for apps and devices
https://wit.ai/
931 stars 91 forks source link

Chunked binary data treated as independent when POSTing to /speech with 'Transfer-encoding: chunked' header. #2032

Closed CatWithAWand closed 1 year ago

CatWithAWand commented 3 years ago

Do you want to request a feature, report a bug, or ask a question about wit? Probably a bug.

What is the current behavior? When POSTing to /speech chunked data with the Transfer-encoding: chunked header, the request is resolved as if the chunk was an audio wave of its own, therefore, Utterances with intents and entities not resolving properly or at all.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem. When POSTing to /speech chunked audio data each request has a response as this chunk was treated independently. Example pseudocode in nodejs fashion:

func postToWit (chunkedData) {
    http.request({
        method: post,
        url: ,
        headers: {
        'Content-Type': `audio/raw;encoding=signed-integer;bits=16;rate=16000;endian=little`,
        'Content-Length': `${chunkedData.length}`,
        'Transfer-encoding': `chunked`,
        'Connection': `keep-alive`,
        'Authorization': `Bearer ${witai_token}`,
        },
      data: chunkedData
    })
    .then(function (response) {
      console.log(JSON.stringify(response.data));
    })

func processAudio(data) {
    // Because data is of 640 frames and its too short put a sufficient ammount in a buffer then POST
    buffer = Buffer.concat(buffer, data) // fill Buffer with new data
    if (buffer.lenght >= 64000) {
        // When buffer reaches 64000 or more bytes initiate postToWit
        postToWit(buffer);
        buffer = Buffer(); // Clear buffer
    }
    if (user_has_stopped_talking) {
        // POST remainder data in buffer
        postToWit(buffer);
        // Handle things
    }
}

stream = receiver.createStream(); // Stream in Opus
decoder = opus.Decoder(); // Decode Opus stream to PCM 16 bits LE, 1 channel (mono), 16000 bitrate, 640 frames
stream.pipe(decoder);

decoder.on(`data`, (data) {
    // Audio processing
    if (user.detected) {
        processAudio(data); // Send data to processAudio to post to wit and detect silence.
    }
    // Wake word detection
    if (wake_word_detected) {
        user.detected = true;
    }
}

Example: Let's assume that we have an app trained to understand the phrase "what is the weather like in texas" and in the console on wit.ai it resolves "{ entities: { wit/location: texas }, intent: wit/get_weather }". Now let's try that using the above code. If we assume that for the above phrase we need 2 requests thus posting 2 chunks, with the first chunk containing the data for "what is the weather" and the second chunk containing "like in texas" the above code will print:

{"entities":{},"intents":[{"confidence":0.9999,"id":"775310103423932","name":"wit$get_weather"}],"text":"what is the weather","traits":{}}

{"entities":{},"intents":[],"text":"like in texas","traits":{}}

Even though the request didn't timeout, and the proper header for chunked data was included, each request is resolved independently.

However, if we change the condition to if (buffer.lenght >= 96000) and now our phrase "what is the weather like in texas" fits in one chunk only, then the above code will print:

{"entities":{"wit$location:location":[{"body":"texas","confidence":0.9044,"end":33,"entities":[],"id":"780750306185063","name":"wit$location","role":"location","start":28,"suggested":true,"type":"value","value":"texas"}]},"intents":[{"confidence":0.9995,"id":"775310103423932","name":"wit$get_weather"}],"text":"what is the weather like in texas","traits":{}}

As it should be, and as it does on the wit.ai console.

What is the expected behavior? When POSTing to /speech with the Transfer-encoding: chunked header, we expect our last request to have a response as if all the chunked pieces were to be constructed as a whole, containing all the information summarized.

If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help. App ID: 965111870692051

Barbog commented 1 year ago

Closing due to no movement on the issue. Please re-open or file a new task should the issue be persisting.