natrys / whisper.el

Speech-to-Text interface for Emacs using OpenAI's whisper model and whisper.cpp as inference engine.
140 stars 10 forks source link

Save recorded files? #19

Closed yizhexu closed 7 months ago

yizhexu commented 7 months ago

Hi, I am interested in keeping the recorded files "/tmp/emacs-whisper.wav" instead of overriding them. I would like to have the option of using them later to do some more model training - eg. improve recognition by learning my accent.

In addition, I also want to add a org property to the transcribed string pointing which audio file it was generated from.

Can you give me some pointers on how to do this? I started by setting an alternative value to the "whisper--temp-file" like this

(defconst whisper-recording-file (concat (concat org-directory "/recording") (make-temp-file (format-time-string "%Y_%m_%d_") nil ".wav")))

;;location of the temporary audio file.
(setq whisper--temp-file whisper-recording-file)

But when I try to record it just says "error in process sentinel: FFmpeg failed to record audio".

natrys commented 7 months ago

For the first problem, you can just copy whisper--temp-file to wherever you want once the transcription is done. So the problem boils down to the question of how to run some custom elisp logic, once some other elisp function runs.

Typically library authors provide "hooks" where you can register your function implementing custom logic, which then gets run at predefined points. Here we provide whisper-pre-process-hook and whisper-post-process-hook which could be used to register various bits of custom logic (see the readme).

(As an aside, if library authors don't provide such hooks, then you can generally use Emacs' advice system (e.g. :after advice would run an advice function after some function runs). Unfortunately this doesn't quite work here because whisper-run is async and immediately returns, so advice function would immediately run too, even before transcription is done)

While you can do the saving in the whisper-post-process-hook, the second problem is slightly trickier. Every function in that hook is run with current buffer set to the temporary buffer that contains only the transcribed output (because this hook is meant for post-processing text only). However you can recover the original point location from the internal variable whisper--marker which is a marker object.

Some code is probably easier understood than words. I think you want something like this in your config:

(defvar my-save-whisper-audio t)

(defun my-save-whisper-audio-clip ()
  (when my-save-whisper-audio
    (let* ((archive-name (format-time-string "%Y%m%d%H%M%S.wav"))
           (archive-file (file-name-concat org-directory "recording" archive-name)))

      (make-directory (file-name-directory archive-file) t)
      (copy-file whisper--temp-file archive-file)

      (with-current-buffer (marker-buffer whisper--marker)
        (goto-char whisper--marker)
        (when (eq major-mode 'org-mode)
          (org-set-property "source" archive-file))))))

(add-hook 'whisper-post-process-hook 'my-save-whisper-audio-clip 100)
yizhexu commented 7 months ago

This worked great for me! Thank you for the explanations you provided.