natrys / whisper.el

Speech-to-Text interface for Emacs using OpenAI's whisper model and whisper.cpp as inference engine.
140 stars 10 forks source link

How do I set it to record ogg format #13

Closed OrionRandD closed 10 months ago

OrionRandD commented 10 months ago

How do I set it to record in ogg format, instead of wav? wav takes lots of space... Is there a variable to set it to ogg?

natrys commented 10 months ago

Not really up to us I think. The whisper model was trained on raw PCM data (which is what the wav file contains). Various audio codecs do compression to save space, but that distorts the data at rest in a way that's unrecognisable to anyone else but the codec itself, including to the whisper model and whisper.cpp inference engine. They explicitly say that they only work on raw PCM data (.wav file).

However, whisper.el places the temp file in /tmp/, if you are worried about it filling your tmpfs then maybe we could do post-advice (or a hook) to immediately delete the temporary file following successful transcription. But really, the /tmp/ folder is cleaned by your OS upon restart, so why bother? Or if you want to archive it, that post-advice function could do that after compressing it with ffmpeg to say .ogg. It's not quite the responsibility of whisper.el but we could show that use-case in a wiki page.

OrionRandD commented 10 months ago

Not really up to us I think. The whisper model was trained on raw PCM data (which is what the wav file contains). Various audio codecs do compression to save space, but that distorts the data at rest in a way that's unrecognisable to anyone else but the codec itself, including to the whisper model and whisper.cpp inference engine. They explicitly say that they only work on raw PCM data (.wav file).

However, whisper.el places the temp file in /tmp/, if you are worried about it filling your tmpfs then maybe we could do post-advice (or a hook) to immediately delete the temporary file following successful transcription. But really, the /tmp/ folder is cleaned by your OS upon restart, so why bother? Or if you want to archive it, that post-advice function could do that after compressing it with ffmpeg to say .ogg. It's not quite the responsibility of whisper.el but we could show that use-case in a wiki page.

Thx for the info... I will just do: ffmpeg -i /tmp/$whatever.wav /tmp/$wathever.ogg If I want to archive anything interesting then...