savbell / whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
GNU General Public License v3.0
244 stars 40 forks source link

Hotkey doesn't work on MacOS #13

Open avi-cenna opened 8 months ago

avi-cenna commented 8 months ago

I have tried the default hotkey, as well as several custom ones, and whatever I do, nothing pops up when I press the hotkey.

On an unrelated note, this script requires sudo on Mac; not sure if there's a way to avoid this.

felixlu07 commented 7 months ago

Hey @avi-cenna did you manage to fix this? I got this issue as well. It works really well on my Windows. But now i'm running to run it on my mac and getting into the hotkeys issue. Nothing I do seems to get it to recognize the keys i set up in config.json. Tried another library as well but to no avail.

felixlu07 commented 7 months ago

Anyway, I finally got it to work. I didn't completely perform a full process of elimination for the root cause, but after switching the library from keyboard to pynput, I managed to get it to work.

I'm currently using Mac OS Sonoma 14.1.2 (23B92), and it seemed that with the original code, Mac OS takes issue with the attempt to create a new window from a thread that is not the main thread. So it seems that In Mac OS, UI-related operations must be performed on the main thread, and this restriction is enforced by the AppKit framework.

The error is likely coming from the StatusWindow class, which seems to be using Tkinter (as indicated by the libtk8.6.dylib in the stack trace). Tkinter is not thread-safe, and all Tkinter operations should be done on the main thread.

To resolve this issue, you need to ensure that the StatusWindow is created and manipulated only from the main thread. The below is how I have managed to revamp the main.py to take into account the 2 issues mentioned above. It should work for Mac OS now. For config.json, the activation_key remains the same. We can set it as cmd+7.

import json
import os
import queue
import threading
import tkinter as tk

from pynput import keyboard
import pyautogui
from transcription import record_and_transcribe

# Define the StatusWindow class using Tkinter
class StatusWindow(tk.Tk):
    def __init__(self, status_queue):
        super().__init__()
        self.status_queue = status_queue
        self.label = tk.Label(self, text="Status")
        self.label.pack()
        self.after(100, self.check_queue)

    def check_queue(self):
        try:
            status, message = self.status_queue.get_nowait()
            self.label.config(text=message)
            if status == 'cancel':
                self.destroy()
        except queue.Empty:
            pass
        self.after(100, self.check_queue)

class ResultThread(threading.Thread):
    def __init__(self, *args, **kwargs):
        super(ResultThread, self).__init__(*args, **kwargs)
        self.result = None
        self.stop_transcription = False

    def run(self):
        self.result = self._target(*self._args, cancel_flag=lambda: self.stop_transcription, **self._kwargs)

    def stop(self):
        self.stop_transcription = True

def load_config_with_defaults():
    default_config = {
        'use_api': True,
        'api_options': {
            'model': 'whisper-1',
            'language': None,
            'temperature': 0.0,
            'initial_prompt': None
        },
        'local_model_options': {
            'model': 'base',
            'device': None,
            'language': None,
            'temperature': 0.0,
            'initial_prompt': None,
            'condition_on_previous_text': True,
            'verbose': False
        },
        'activation_key': 'ctrl+space',
        'silence_duration': 900,
        'writing_key_press_delay': 0.008,
        'remove_trailing_period': True,
        'add_trailing_space': False,
        'remove_capitalization': False,
        'print_to_terminal': True,
    }

    config_path = os.path.join('src', 'config.json')
    if os.path.isfile(config_path):
        with open(config_path, 'r') as config_file:
            user_config = json.load(config_file)
            for key, value in user_config.items():
                if key in default_config and value is not None:
                    default_config[key] = value

    return default_config

def clear_status_queue():
    while not status_queue.empty():
        try:
            status_queue.get_nowait()
        except queue.Empty:
            break

def on_shortcut():
    global status_queue, status_window
    clear_status_queue()

    status_queue.put(('recording', 'Recording...'))
    recording_thread = ResultThread(target=record_and_transcribe, args=(status_queue,), kwargs={'config': config})
    recording_thread.start()

    recording_thread.join()

    if status_window:
        status_queue.put(('cancel', ''))

    transcribed_text = recording_thread.result

    if transcribed_text:
        pyautogui.write(transcribed_text, interval=config['writing_key_press_delay'])

def format_keystrokes(key_string):
    return '+'.join(word.capitalize() for word in key_string.split('+'))

config = load_config_with_defaults()
method = 'OpenAI\'s API' if config['use_api'] else 'a local model'
status_queue = queue.Queue()

special_keys = {
    'cmd': keyboard.Key.cmd,
    'ctrl': keyboard.Key.ctrl,
    'shift': keyboard.Key.shift,
    'alt': keyboard.Key.alt,
    # Add other special keys as needed
}

# Define the hotkey combination
hotkey = config['activation_key'].split('+')
# Convert to the format used by pynput
hotkey = [special_keys[k.lower()] if k.lower() in special_keys else k for k in hotkey]

# The set of keys currently pressed
current_keys = set()
# Create the StatusWindow on the main thread
status_window = StatusWindow(status_queue)

def on_press(key):
    # Add the pressed key to the set
    if hasattr(key, 'char') and key.char:
        current_keys.add(key.char)
    else:
        current_keys.add(key)

    # Check if all hotkey keys are currently pressed
    if all(k in current_keys for k in hotkey):
        on_shortcut()

def on_release(key):
    # Remove the released key from the set
    if hasattr(key, 'char') and key.char:
        current_keys.discard(key.char)
    else:
        current_keys.discard(key)

# Run the status window and listener on the main thread
if __name__ == "__main__":
    try:
        # Start the keyboard listener
        listener = keyboard.Listener(on_press=on_press, on_release=on_release)
        listener.start()

        print(f'Script activated. Whisper is set to run using {method}.')
        print(f'Press {format_keystrokes(config["activation_key"])} to start recording and transcribing.')
        print(f"Listener is trusted: {listener.IS_TRUSTED}")

        # Start the Tkinter main loop
        status_window.mainloop()

        # Wait for the listener thread to finish
        listener.join()
    except KeyboardInterrupt:
        print('\nExiting the script...')
        os._exit(0)  # Use os._exit to exit immediately without cleanup
savbell commented 6 months ago

Thank you @felixlu07 for providing your solution! Would you be willing to open a PR with your changes so others can also use your fix? I just approved PR #10 which moved us to pynput from pyautogui already, but we could use your threading fixes. Thanks for your help! :)