haesleinhuepf / faster-whisper-stt

Simple and basic python function for transcribing audio in realtime based on faster-whisper
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

object oriented programming #8

Closed haesleinhuepf closed 2 weeks ago

haesleinhuepf commented 2 weeks ago

Modify the content in utilities.py in a way that all functions sit in a "Listener" class. Also make sure that the new class is used properly in continuous_listing.ipynb . Instead of calling individual functions, you instantiate an object of the new class. Afterwards, call functions on that object.

git-bob think about this.

github-actions[bot] commented 2 weeks ago

This message was generated by git-bob (version: 0.2.3, model: gpt-4o-2024-08-06), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

To address the issue, I've refactored the utilities.py file to encapsulate all the functions within a Listener class. The updated code now looks like this:

Updated utilities.py

import sounddevice as sd
import numpy as np
import queue
from scipy.io.wavfile import write
from faster_whisper import WhisperModel

class Listener:
    def __init__(self):
        self.audio_queue = queue.Queue()
        self.sample_rate = 16000
        self.audio_data = []
        self.stream = sd.InputStream(callback=self.audio_callback, samplerate=self.sample_rate, channels=1, dtype='int16')

    def audio_callback(self, indata, frames, time, status):
        """This function is called by sounddevice during audio recording."""
        self.audio_queue.put(indata.copy())

    def start_recording(self, b=None):
        """Starts audio recording."""
        self.audio_queue.queue.clear()  # Clears the queue before recording
        self.stream.start()

    def interrupt_recording(self, b=None, my_text=None):
        """Stops audio recording and starts transcription."""
        self.stream.stop()

        if len(self.audio_data) > 20:
            self.audio_data = self.audio_data[-20:]

        while not self.audio_queue.empty():
            self.audio_data.append(self.audio_queue.get())
        self.start_recording()
        if self.audio_data:
            audio_np = np.concatenate(self.audio_data, axis=0)
            write('output.wav', self.sample_rate, audio_np.astype(np.int16))  # Writes WAV file
            self.transcribe_audio('output.wav', my_text)

    def transcribe_audio(self, audio_path, my_text):
        """Transcribes the recorded audio."""
        model = WhisperModel("small", device="cpu")
        segments, _ = model.transcribe(audio_path, language="de", word_timestamps=True)
        transcription = " ".join([segment.text for segment in segments])
        my_text.value = transcription

    def reset_transcription(self, b, text_output):
        """Resets the transcription."""
        text_output.value = ""

Updated continuous_listening.ipynb

Replace the code that imports individual functions from utilities and the calls to those functions with an instantiation and use of the Listener class as follows:

from utilities import Listener

listener = Listener()
my_text = Textarea()

listener.start_recording()
for i in range(100):
    sleep(0.5)
    listener.interrupt_recording(my_text=my_text)

This refactoring ensures that all functionalities are encapsulated within the Listener class, and appropriately modifies continuous_listening.ipynb to work with this new design.

haesleinhuepf commented 2 weeks ago

Yes, wonderful! Go ahead git-bob implement this!