qasim9872 / aws-transcribe

A client for Amazon Transcribe using the websockets API
https://www.npmjs.com/package/aws-transcribe
MIT License
10 stars 15 forks source link

Connection timed out #5

Open orgads opened 4 years ago

orgads commented 4 years ago

I'm using the sample code (with a valid user of course), and getting the following output (I enabled DEBUG=aws-transcribe:*):

2020-06-04T16:56:44.432Z aws-transcribe:C:\Projects\jstest\node_modules\aws-transcribe\dist\StreamingClient.js opened connection to aws transcribe
transcribe connection opened
test.ts:17
Error: Your request timed out because no new audio was received for 15 seconds.
events.js:315
transcribe connection closed
qasim9872 commented 4 years ago

Hey, can you share a snippet of how you're creating the StreamingClient instance and the way the audio is being piped into it?

I tested using the example file and that is working for me.

One additional thing to check would be if you have sox installed, the node-record-lpcm16 package which is being used in the example internally relies on sox for streaming the user audio.

orgads commented 4 years ago
fs.createReadStream('test.wav').pipe(transcribeStream);
orgads commented 4 years ago

Ok, I got it to work with throttle. Is it mandatory? This is strange.

It also misses about 500ms from the beginning, and about 2.5s from the end. Any idea why?

This is how it looks now (requires npm install throttle @types/throttle):

import * as Throttle from 'throttle'
// ...
fs.createReadStream('test.wav').pipe(new Throttle(32000)).pipe(transcribeStream);
orgads commented 4 years ago

Ok, the missing part in the beginning is because the stream started before the connection was opened. I moved the pipe to the 'open' event and it was fixed.

Still investigating the missing end.

qasim9872 commented 4 years ago

Nice, for the missing part at the end, I am guessing the amount buffered is less than the threshold you have specified so it’s not forwarding that?

Could you also share the final script and we can add that to the examples? Would be good to have that in there for reference.

orgads commented 4 years ago

https://github.com/qasim9872/aws-transcribe/pull/9 solves the early send issue. I still didn't find a solution for the end.

orgads commented 4 years ago

I pushed an example: https://github.com/qasim9872/aws-transcribe/pull/11

I found that if I pad my file with 100K of zeros it recognizes everything. Can you please have a look? It's too late for me now.

The example is linked above, and the file is here: test.zip

orgads commented 4 years ago

And the throttling should probably be done in the library, and not in client code. There's no reason for the user to get into these details.

qasim9872 commented 4 years ago

I agree to a certain extent but how would the package differentiate if the client is already throttling the input stream? My original use case depends on a live audio stream that takes care of the throttling, perhaps we can make it an optional feature so those who want the package to handle throttling, can enable it. Otherwise, we leave it up to the user.

Aung-Myint-Thein commented 3 years ago

In the file streaming example I was getting the connection time out and error at the end of the file after 15 seconds. However, I found a more elegant way to end the streaming of the file as following.

.on(StreamingClient.EVENTS.OPEN, () => console.log(`transcribe connection opened`))
.on('finish', () => {
   console.log(`transcribe finished. will destroy`);
   transcribeStream.destroy();
})
.on(StreamingClient.EVENTS.CLOSE, () => console.log(`transcribe connection closed`))

In nodejs streaming, we can know if the streaming is ended by calling on('finish', ...). So, need to put a new event type in streamingClient from the library too.