Support transcription longer than 30 seconds

seanoliver / audioflare

An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and translate any audio file.

https://audioflare.seanoliver.dev/

MIT License

378 stars 29 forks source link

Support transcription longer than 30 seconds #4

Open seanoliver opened 8 months ago

iGerman00 commented 7 months ago

Great work on this project! I'm curious about the status of this issue. Is there any update or plan?

I have some suggestions/ideas that might help, sadly I don't know how to implement them though:

Divide the audio file into small segments on the client side and send them sequentially with some buffer time.
Use only the JSON data for easy processing and merging of the chunks, recreate the other formats on the client side.
Use two workers: one for handling the audio file and one for running the AI models.

seanoliver commented 6 months ago

Hi @iGerman00! Thanks for the kind words the suggestions. Also apologies for the delay in getting back to you. I was a little backed up with other projects in the runup to the holiday season.

I've been planning to implement your first suggestion as it seems like the most straightforward / flexible option as AI workers grow to support other models with other limits. I'm going to take a shot at basic implementation over the next week or so and I'll keep you posted (traveling in Asia so my coding time is not as plentiful as it usually is). Very open to feedback along the way — thanks again!

iGerman00 commented 6 months ago

Glad to hear, thanks for the response. I'd love to use something like this for stuff like some sort of personal live transcription service or at least have the ability to just drop in an hour-long file, wait a bit and have it come back to me transcribed, so I'm really excited for the future of the project. Self-hosting a completely roll-your-own whisper service is kind of a pain