wit-ai / wit

Natural Language Interface for apps and devices
https://wit.ai/
931 stars 91 forks source link

Bidirectional /speech #2628

Closed JoeCooper closed 1 year ago

JoeCooper commented 1 year ago

I would like to write to /speech in node-js using the https library.

I would prefer to not use node-wit, and have tried it already; I'd really like support on using the endpoint if that's alright.

Basically, I get an open connection, I get a 200 OK, and I send the audio sample.

However, wit.ai/speech never returns a transcription. After twenty seconds or so, it writes the following:

{ "code": "timeout", "error": "Timeout, please try again later" }

I've considered that the https library isn't sending the chunk header and terminator, and I've tried adding them explicitly to the stream, like so:

req.write(${buffer.length.toString(16)}\r\n) req.write(buffer) req.write('\r\n');

However, this doesn't effect outcomes.

Any hints?

JoeCooper commented 1 year ago

I've finally resorted to writing HTTP over a socket directly and I see that the server just doesn't respond with chunked encoding:

Zrzut ekranu 2023-05-21 o 14 26 29

It doesn't pass back a header unless I send the final terminator (0\r\n\r\n) and doesn't encode its response in chunks.

But it says directly in the manual:

This API delivers partial transcriptions as they come, as well as partial intents, entities and traits run on partial transcriptions. Intermediate speech recognition and understanding results are sent in standard HTTP/1.1 chunks.

I need some hints --- how do I get this streaming to work? (This feature is the only reason we're trying to use wit.ai!)

JoeCooper commented 1 year ago

If I send without chunks, it replies with chunks. Is this intended behavior?

On ticket #2461 it's written, "we support bidirectional streaming, meaning if you stream the data to the server, you can get live transcription back while the user is speaking". Does this rely on some characteristic of the socket?

dzlandis commented 1 year ago

I would also like to know how this can be done! I was never able to figure it out with the information provided from my original issue (#2461). I really wish that WitAI just used websockets, it would make this way easier!

JoeCooper commented 1 year ago

@dzlandis this will be rude, but so is radio silence. I solved this by switching to symbl.ai.

Their interface uses web sockets. Their browser client has a little derp of its own but if I need to I can just implement my own client.

it’s working. Great, strong recommend

patapizza commented 1 year ago

Hi @JoeCooper, sorry for the delay.

To get streamed responses, you need:

For faster processing, we highly recommend to use signed-integer 16 bits, 16k sample rate and little endianness, for either raw, flac or wav file formats.

dzlandis commented 1 year ago

@patapizza please provide an example. And do not direct to the node js library as that code does not work as intended when it comes to bidirectional speech.

JoeCooper commented 1 year ago

@dzlandis he already gave a vague non sequitur from the manual, what more do you need ;)

dzlandis commented 11 months ago

@patapizza I'm giving this another try.

Hi @JoeCooper, sorry for the delay.

To get streamed responses, you need:

For faster processing, we highly recommend to use signed-integer 16 bits, 16k sample rate and little endianness, for either raw, flac or wav file formats.

This is not a solution to the problem. I and others have already tried this and it does not work as expected. Please provide an example of how this can work correctly or mark in the documentation that this it is no longer supported or currently unavailable. I would like this issue to be reopened as this is essentially a non-existent feature as it stands right now and is misleading. No problems have been resolved.

Your documentation currently states: https://wit.ai/docs/http/20230215/#post__dictation_link

We accept chunked data, which is a good way to reduce latency. In addition to the Content-type header, you must set a Transfer-encoding header to chunked. 13KWgtrvjH

Please provide a solution or acknowledge that you are aware of this currently not working!