josephluck / internote

:heart: Beautiful web-based note editor with a focus on distraction-free content creation.
https://internote.app
28 stars 0 forks source link

Implement speech to text #225

Open josephluck opened 5 years ago

josephluck commented 5 years ago

Flow:

User hits record on internote, the file streams in real-time to S3 bucket (similar set up to attachments where Cognito Federated Identity is used) and once the user presses submit, the transcribe lambda is hit with the relevant S3 key. The lamba instantiates AWS Transcribe to transcribe the file in to text, and the text is returned to the user.

It would be worth attempting to trim the audio down at the head and tail inside the lambda to keep the length of the transcribed audio to a minimum to reduce costs.

Set a maximum of 5 minutes to keep the requests small?


Interaction

The user will press the transcribe button in the toolbar where a toolbar tray will raise up and start recording. There will be a cancel button and a submit button where pressing the cancel button will kill the stream and remove the file from S3 and pressing submit will kick off the transcription process.

It's important to give the user the feeling that there is audio recording, so perhaps intercept the volume coming in to kick off an animation if the volume is >30% or something. The animation should be something like a waveform.

If there is a limit on the duration of the transcription then it would also be good to show that in the UI somewhere so the user does not and up recording something that is incompatible. Once the limit is hit, the recording should stop and it should be clear to the user what to do next (submit or cancel).

It would probably be nice if the user could embed the audio file alongside the text?

Also in the future it would be great to stream the transcription in real-time, though I'm unclear as to how this could be done effectively with AWS Lambda (and no real-time sockets etc).


AWS

AWS transcribe is pretty decent, though sadly no UK version yet, but the US version is still fairly good.

Console: https://eu-west-1.console.aws.amazon.com/transcribe/home?region=eu-west-1#realTimeTranscription

Docs: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/TranscribeService.html

Example serverless project: https://github.com/serverless/examples/blob/master/aws-node-simple-transcribe-s3/handler.js