sloganking / desk-talk

A desktop transcription software
MIT License
11 stars 0 forks source link

Should we remove ffmpeg? #3

Open sloganking opened 11 months ago

sloganking commented 11 months ago

ffmpeg currently converts the recorded wav files to mp3 files. This is because wav files have no compression, and the OpenAI api has a hard limit that files must be under 25 MB.

https://platform.openai.com/docs/guides/speech-to-text/longer-inputs

By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB's or less or used a compressed audio format.

If I knew how to save audio recording to mp3 files in rust. We could get around this. But I've only seen an example of writing to wav files so far. See:

https://github.com/RustAudio/cpal/blob/master/examples/record_wav.rs

jonassmedegaard commented 5 months ago

Since the audio files contain speech, it is far far more efficient to compress using Ogg/Opus than as mp3, and seems supported by OpenAI (seemingly the API supports the container format Ogg, and the code itself supports whatever ffmpeg supports which - depending on how it is compiled - includes the Opus codec).

So I would suggest to look at opus and crates depending on that for inspiration on how to save as opus - with far lighter dependencies than the ffmpeg giant.