savbell / whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
GNU General Public License v3.0
351 stars 54 forks source link

Improvements for Wayland and many more #60

Closed dariox1337 closed 2 months ago

dariox1337 commented 2 months ago

First of all, sorry for a PR that has unrelated changes, but since the code base is small, later changes naturally depend on older ones. If you want, you can cherry pick the earlier changes and discard others. Besides, half of this pull request is in the final commit because there is about 500 lines of key code mappings.

Now, speaking of changes:

  1. Added an option to load models from local folders. Thus, the program can work fully offline.
  2. Faster-whisper has int8 quantization for CPU inference. Added it.
  3. pynput support of Wayland is spotty. I rewrote keyboard simulation to support multiple backends, then added ydotool and dotool in addtion to pynput. Originally, I did ydotool but I found it slow, so then added dotool.
  4. Added ConfigManager to remove passing config around. Each class can talk to ConfigManager directly. It also provides method to print to console depending on the config setting.
  5. The original code used busy-waiting while recording audio. The following lines in result_thread.py:
            while self.is_running:
                self.mutex.lock()
                current_recording_state = self.is_recording
                self.mutex.unlock()

    thrashed my cpu with no mercy. I rewrote it to use an event. CPU usage is negligible now. In later commits I further cleaned up the implementation.

  6. Fixed the window title color for systems that use a dark theme because the default is light gray, the text was almost illegible on white background.
  7. Added an option to set minimal audio duration. This is mostly useful for hold_to_record since if you accidentally press the shortcut and release it quickly, the audio will be really short. We can simply discard it instead of trying to transcribe.
  8. Rewrote how audio data is passed around. Rather than using temp files, which keep piling up, pass data in-memory.
  9. While implementing the next thing, I noticed that improper initialization of QT windows caused bugs with threading. So, the main app now inherits QObject. Also, settings_window now asks the main app to restart/close because there is some cleanup work needed to be done.
  10. Rewrote KeyListener to support multiple backends, and implemented evdev in addition to pynput because the latter's support of Wayland is spotty. pynput often failed to register keys for me.
savbell commented 2 months ago

Thanks so much for the PR! I'm glad you've been able to use WhisperWriter and make these improvements.

I tested out the changes locally and noticed a couple bugs. Specifically, transcription through the API didn't work anymore since it expected an audio file, and the visibility function for the settings window was also broken. I made some changes to fix these, as well as updating the docs. Things should be good to go now! :)