Speech to Text crashing Node-Red when receiving Wav audio stream

isramos commented 8 years ago

I'm generating an audio stream from a Raspberry pi into a file, then using Node-RED I'm monitoring that file with the "tail in" node and sending that via a websocket to a Node-Red in Bluemix. I can see my stream coming to Bluemix's node-red just fine, but when I connect that stream to Watson's Speech-to-Text, my Node-red crashed. I imagine Watson S2T does not like the WAV stream format.

I record using: arecord -vv -r 16000 -f S16_LE -c1 -D plughw:1 -d 3600 ./micStream/capture.wav

and below is the cf log for my app:

2016-02-18T23:26:05.25-0600 [App/0]      OUT 19 Feb 05:26:05 - [red] Uncaught Exception:
2016-02-18T23:26:05.25-0600 [App/0]      OUT 19 Feb 05:26:05 - Error: Invalid URI "RIFF$��WAVEfmt%20�%3E}data��"
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at Request.init (/home/vcap/app/node_modules/node-red-bluemix-nodes/node_modules/request/request.js:413:31)
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at new Request (/home/vcap/app/node_modules/node-red-bluemix-nodes/node_modules/request/request.js:264:8)
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at request (/home/vcap/app/node_modules/node-red-bluemix-nodes/node_modules/request/index.js:50:10)
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at stream_url (/home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:147:9)
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at /home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:156:9
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at /home/vcap/app/node_modules/node-red-bluemix-nodes/node_modules/temp/lib/temp.js:252:7
2016-02-18T23:26:05.25-0600 [App/0]      OUT     at Object.oncomplete (fs.js:108:15)
2016-02-18T23:26:05.30-0600 [RTR/1]      OUT node-redxxx.mybluemix.net - [19/02/2016:05:25:09 +0000] "GET /mic HTTP/1.1" 101 0 0 "-" "-" 192.155.222.111:49260 x_forwarded_for:"74.192.222.111, 192.155.237.119" x_forwarded_proto:"http" vcap_request_id:5476b179-892b-4363-425b-a713e86aa2b0 response_time:56.049387974 app_id:1b7f997a-3d4a-4797-a31c-73d30955e7c8 x_global_transaction_id:"1936539453"
2016-02-18T23:26:05.30-0600 [App/0]      ERR 
2016-02-18T23:26:05.40-0600 [API/3]      OUT App instance exited with guid 1b7f997a-3d4a-4797-a31c-73d30955e7c8 payload: {"cc_partition"=>"default", "droplet"=>"1b7f997a-3d4a-4797-a31c-73d30955e7c8", "version"=>"5cca8298-6cb9-4e26-9540-3409f13188fc", "instance"=>"36924296bab443b59c67c6d5fbb3ffb9", "index"=>0, "reason"=>"CRASHED", "exit_status"=>1, "exit_description"=>"app instance exited", "crash_timestamp"=>1455859565}
2016-02-18T23:26:21.44-0600 [DEA/174]    OUT Starting app instance (index 0) with guid 1b7f997a-3d4a-4797-a31c-73d30955e7c8
2016-02-18T23:26:35.98-0600 [App/0]      OUT Detected 1024 MB available memory, 512 MB limit per process (WEB_MEMORY)
2016-02-18T23:26:35.98-0600 [App/0]      OUT Recommending WEB_CONCURRENCY=2
2016-02-18T23:26:37.17-0600 [App/0]      OUT Welcome to Node-RED
2016-02-18T23:26:37.17-0600 [App/0]      OUT ===================
2016-02-18T23:26:37.17-0600 [App/0]      OUT 19 Feb 05:26:37 - [info] Node-RED version: v0.11.2
2016-02-18T23:26:37.17-0600 [App/0]      OUT 19 Feb 05:26:37 - [info] Node.js  version: v0.10.40
2016-02-18T23:26:37.17-0600 [App/0]      OUT 19 Feb 05:26:37 - [info] Loading palette nodes
2016-02-18T23:26:39.49-0600 [App/0]      OUT iotf-service-staging credentials not obtained...
2016-02-18T23:26:39.49-0600 [App/0]      OUT neither iotf-service nor iotf-service-staging credentials were obtained...
2016-02-18T23:26:39.82-0600 [App/0]      OUT 19 Feb 05:26:39 - [info] Settings file  : /home/vcap/app/bluemix-settings.js
2016-02-18T23:26:39.84-0600 [App/0]      OUT 19 Feb 05:26:39 - [info] Server now running at http://127.0.0.1:62927/red/
2016-02-18T23:26:42.96-0600 [App/0]      OUT 19 Feb 05:26:42 - [info] Starting flows
2016-02-18T23:26:43.03-0600 [App/0]      OUT 19 Feb 05:26:43 - [info] [inject:Tick every 5 secs] repeat = 5000
2016-02-18T23:26:43.05-0600 [App/0]      OUT 19 Feb 05:26:43 - [info] Started flows

philippe-gregoire commented 8 years ago

Nothing definitive, but it looks like somehow the content of your stream (starting with the RIFF$ marker) is interpreted as an URI (that would point to the source of a stream), and not as an audio stream.

philippe-gregoire commented 8 years ago

Actually, you probably pass your stream data in a String type object: Supported msg.payload types:. String URL to audio Buffer Raw Audio Bytes You would need to convert that to a Buffer type. In NODE-RED, the Buffer module is accessible from function nodes, so you would need to add an additional step using e.g. new Buffer(yourStreamFromRasPi, 'hex');

isramos commented 8 years ago

Thanks. That's what I figured, that some type of data conversion was needed. So I added a function node like this:

const buf = new Buffer( msg.payload, 'hex');
msg.payload = buf.toString();
return msg;

However that did not work. Looks like my msg.payload is always empty. Not sure what's the proper use of buffer. Any way you can provide a sample.

I would be great if there was a document explaining the streaming, similar to the doc about passing a pre-recorded audio file: https://github.com/watson-developer-cloud/node-red-labs/tree/master/basic_examples/speech_to_text

philippe-gregoire commented 8 years ago

It will probably depend on the encoding of your audio 'stream' data as it arrives into Node-RED. It may not be hex, but base64. It may also already be binary, in which case you would not even need Buffer.

Note that technically, i think that the data is not really a stream, and it will be treated as a big chunk of data, with i.e. a beginning (with the RIFF/Wav header), and an ending when the original file is finished. I would have to dig more in the tail node behavior to be 100% sure. The thing is that the Node-RED payload cannot be a stream, it has to start and end.

Now, thinking about this, what I suspect is that the data you send from the RPi does not have the appropriate MIME type setting in the HTTP header. So it gets interpreted as the default text/plain, and Node puts it in a String object. We had a similar issue with the text-to-speech flow, so we had to set the HTTP headers explicitly to get it rendered properly. See this lab: https://github.com/watson-developer-cloud/node-red-labs/tree/master/basic_examples/text_to_speech typically: // Set the content type to audio wave var attch='attachment; filename='+encodeURIComponent('RaspiMike.wav'); msg.headers={ 'Content-Type': 'audio/wav','Content-Disposition': attch};

Philippe return msg;

jthomas commented 8 years ago

Thanks for reporting this, I'll look into this next week and try to re-produce.

isramos commented 8 years ago

I was able to get rid of the error "Error: Invalid URI" by converting msg.payload from string to buffer

var buf1 = new Buffer(msg.payload, 'hex');
msg.payload = buf1;
return msg;

Now it seems my stream is not being recognized as an wav. The error seems to be in cb(fileType(buf).ext) , line 130 of https://github.com/node-red/node-red-bluemix-nodes/blob/master/watson/s2t.js

Here's the cf dump

2016-02-19T21:40:59.46-0600 [App/0]      OUT 20 Feb 03:40:59 - [red] Uncaught Exception:
2016-02-19T21:40:59.46-0600 [App/0]      OUT 20 Feb 03:40:59 - TypeError: Cannot read property 'ext' of null
2016-02-19T21:40:59.46-0600 [App/0]      OUT     at is_wav_file (/home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:110:38)
2016-02-19T21:40:59.46-0600 [App/0]      OUT     at wav_sample_rate (/home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:114:14)
2016-02-19T21:40:59.46-0600 [App/0]      OUT     at find_sample_rate (/home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:125:27)
2016-02-19T21:40:59.46-0600 [App/0]      OUT     at /home/vcap/app/node_modules/node-red-bluemix-nodes/watson/s2t.js:134:11
2016-02-19T21:40:59.46-0600 [App/0]      OUT     at Object.oncomplete (fs.js:108:15)
2016-02-19T21:40:59.49-0600 [App/0]      ERR

I imagine the error might be because of the data itself, not being recognized as WAV by the fileType utility OR it may be because of missing metadata to indicate it's a WAV. I'll keep trying to solve this...

knolleary commented 8 years ago

The data will already have been corrupted as soon as it was encoded as a String. Converting back to a Buffer object won't undo the corruption.

You need to work backwards to identify at what point is it being converted to a String rather than kept as a Buffer/binary object end-to-end.

isramos commented 8 years ago

Still no success in having a Raspberry Pi + mic running something like this demo: https://speech-to-text-demo.mybluemix.net/

My findings:

I start experimenting with Websockets, as that seems to be the prefered way for Streaming: https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/using.shtml
I got the "one-shot" transcription working (both with WS and HTTP with the official S2T node)
about encoding: I fixed the encoding by looking at a pre-recorded WAV file in the debug window and comparing to the start of recording from my microphone. I found that WAV file starts with: RIFF........WAVE [hex: 52494646xxxxxxxx57415645]
I was able to "match" the pre-recorded wav with the live audio by adding this node:

const buf2 = { payload: new Buffer( msg.payload ) };
return buf2;

Note: "Watson Speech-to-Text" relies on scanning for a start-of-file signature (done by fileType(content).ext in s2t.js ) before it sends the data to Watson S2T. That approach is fine for one-shot, but I wonder if is bad for streaming as there may not be header retransmission often enough.
I tried the daemon node to pipe arecord into msg.payload, but that crashed nore-red way too often. Gave up on this approach
I did a lot of experimentation setting the "arecord" params and watson s2t settings, and got error such as the one below. arecord -r 16000 -f S16_LE -D plughw:1 -d 3600 -c2 ./capture.wav Recording WAVE './capture.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo Node-Red Setting: 'content-type': 'audio/l16; rate=16000; channels=2'

{ "payload": { "error": "unable to transcode data stream audio/wav -> audio/x-float-array " }, "_session": { "type": "websocket", "id": "4fb12e0e.b04ed" }, "_msgid": "f110b9a8.0eef48” }

At some point, I was streaming audio into S2T and there was no errors. However there was no transcription either. I suspect that there may be a mismatch of sampling rate, or some other parameter, but I have no way to see logs on Watson S2T. I wish I could see more logs in special of websocket, as watson is closing the connection sometimes and I cannot see the error message.

jthomas commented 8 years ago

Can you share you flow with me and I'll try to run it locally?

Looking at the "Tail" node, it automatically converts incoming data into a String before returning. This will stop you accessing the data as a Buffer. However, if you use the "exec" node with the same command, tail -f /path/to/audio.wav, it does handle Buffer data correctly and you get the streaming audio bytes. This should be able to be passed over the Websocket.

dceejay commented 8 years ago

Should I take this as a hint to add binary mode to the tail node? or should it be using the file read node instead of tail ?

jthomas commented 8 years ago

It would be a nice addition for "tail" node to handle binary data, the code from the "exec" node provides an example of this. "File Read" node won't handle streaming data unless I've missed something.

dceejay commented 8 years ago

well yes - but as you pointed out earlier a wav file is not exactly streaming.... (but yes will look at tail)

isramos commented 8 years ago

I very close to crack the code! hint: send small files, and not tail of a file. Will post more details tomorrow...

dceejay commented 8 years ago

Have added binary mode to the tail node... but as it's in core will need next release to make it to Bluemix

GardenerOfEden commented 8 years ago

Hi I've been working on a very similar problem, except I am running the whole thing locally (i.e. node-red runtime is local to the machine where the audio capture happens). My only remote element is the watson service itself. It took me a lot of fiddling, but I now have continuous capture into a single file with tail following this file and streaming data to watson over a persistent websocket, with interim results coming back on the fly and generating a text message with any recognized speech as it appears. I realise this might not solve the main issue above because it doesn't concern getting local audio capture streaming into node-red running on bluemix, but for what it's worth here's my approach. Hope it helps.

My thanks to @isramos for your extensive notes, I found all of the above discussion invaluable! watson-test-flow

The 'capture wav' exec node (which ignores its incoming payload) simply starts the local audio capture with: arecord -D <alsa_device> -f S16_LE -r16000 [capture_filename]. The 'start' function node sends the following configuration to be JSON'd into the websocket:

msg.payload = { 'action': 'start', 'content-type': 'audio/l16;rate=16000;channels=1', 'continuous': true, 'inactivity_timeout': -1, 'interim_results': true, 'max_alternatives': 1 }; return msg;

Once we have hit 'start' (arecord is now running, and the websocket should reply to the action: start command with state: listening), activating the 'tail wav' node uses an exec as @jthomas suggested, with an initial character listing of zero and following the file: tail -c 0 -f [capture_filename]. Crucially the exec node is set to use spawn() instead of exec() because this keeps outputting chunks in node red (as a raw buffer msg) as the data is added to the file, rather than waiting for the tail process to finish. With the content-type set to match the arecord format, all is well and we start getting results messages back from the websocket. My 'trans?' function node looks for the final result and pulls out the first alternative transcript (there is only one because of my max_alternatives setting above). This output message is the event which I could then pass elsewhere in my flows (here I'm just showing debug 'transcription').

Two other details. Note that on 'stop' I obliterate the background arecord and tail processes by calling ps -e | grep tail | awk '{print $1}' (in 'find tail' and 'find arecord') and passing the result (if it is a number) to a kill command. This would be neater as a subflow but it works for now. Secondly note that I keep the session alive by sending action: no-op every 25s just before hitting the session expiration timeout. This is not very intelligent because it injects on start and always sends even if the session is not active for other reasons, but it works for now and keeps my session valid for long periods of time even if no speech is recognised. Which brings me onto my current final issue...

I'm wondering how to deal with the 60-minute TTL expiry of tokens acquired from the service? Currently I have to get one manually (using the 'get Token' exec node curl -u [cred_username]:[cred_password] "https://stream.watsonplatform.net/authorization/api/v1/token?url=https://stream.watsonplatform.net/speech-to-text/api"), copy the returned value to the clipboard, and manually edit my websocket-client config node URL to set the watson-token query parameter along with setting the recognition model: wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize?watson-token=[TOKEN]&model=en-UK_BroadbandModel&x-watson-learning-opt-out=1. Isn't there some way to dynamically feed this parameter into the websocket config in node red? If my session dies then within an hour I have to manually reconfigure the token again :/

isramos commented 8 years ago

I also got it all working using the Websocket node. My approach was to close the wav file before sending it. I send wav files containing 500ms of audio. Works like a charm.

However that approach is not working for the "speech-to-text" node. I believe is that "speech-to-text" node is NOT configured to streaming mode. When I say the word "independent", it generate 3 separate transcriptions: "in" "deep" " "end". Several "one-shots". @jthomas: I suggest the team to look into adding _streaming _ support to "speech-to-text" node. Per Watson S2T docs, all you'd have to do is add the "continuous" query parameter of recognize method

Back to Websockets, I'm also having the issue with the 60-minute TTL expiry @GardenerOfEden . I added a comment to an open issue advocating for the ability to pass options to the Websocket node's URL param: https://github.com/node-red/node-red/issues/797#issuecomment-189093956

jthomas commented 8 years ago

Great to hear you got this working, I'll close this issue.

nickmarx12345678 commented 7 years ago

@isramos

My approach was to close the wav file before sending it.

Any chance you could clarify further? I think I'm having a similar issue streaming a wav to watson from S3

ibmresearchuk / node-red-bluemix-nodes

Speech to Text crashing Node-Red when receiving Wav audio stream #39