linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

High Memory Usage and "Killed: 9" Error. #75

Closed mcgreenwood closed 1 year ago

mcgreenwood commented 1 year ago

I have a 20 sec. audiofile, but while trying to transcribe it with the tiny model, the Python Kernel Memory rises up to 128GB and got killed.

Any Ideas what can cause this problem?

Btw. It worked with one file that was longer, but none of the other files.

Jeronymous commented 1 year ago

This sounds incredible (memory overflow with 20 sec audio and tiny model)!

I have no idea how that's possible.

Can you share the audio and the command that you launch?

mcgreenwood commented 1 year ago

It happens with all files I try to transcribe. They have 22050 Khz sample rate. I already tried different sample rates to reduce the memory usage.

Here's the python code I use.

import os
import json
import sys
import numpy as np
from scipy.io import wavfile
from scipy.signal import resample
import whisper_timestamped as whisper
import time
sys.path.append('whisper-timestamped/')

file_name = "TXMIkzyF9xo"
json_path = f"output/transcriptions/json/{file_name}.json"
sample_rate, audio = wavfile.read(f"output/raw/{file_name}.wav")
audio = np.array(audio, dtype=np.float32) / 2**15
audio_duration = len(audio) / sample_rate
print(f"Duration of the audio file: {audio_duration} seconds")

# Downsample the audio to 8000 Hz
target_sample_rate = 8000
audio = resample(audio, int(len(audio) * target_sample_rate / sample_rate))
sample_rate = target_sample_rate

transcription_loaded = False
if os.path.exists(json_path) and os.path.getsize(json_path) > 0:
    try:
        # Load the transcription from the existing JSON file
        with open(json_path, "r") as f:
            transcription = json.load(f)
            transcription_loaded = True
    except json.JSONDecodeError:
        # JSON file is malformed, proceed to transcribe the audio
        pass

if not transcription_loaded:
    # model = whisper.load_model("NbAiLab/whisper-large-v2-nob", device="cpu")
    model = whisper.load_model("tiny", device="cpu")
    result = whisper.transcribe(model, audio, language="de")
    transcription = result 

    # Save the JSON file
    with open(json_path, "w") as f:
        json.dump(transcription, f, indent=2)
mcgreenwood commented 1 year ago

Ah... I must have accidentally dropped that line of code. audio = whisper.load_audio(f'output/raw/{file_name}.wav') Sorry for bothering....

Works perfect now. Great work by the way!!

Jeronymous commented 1 year ago

Ah so you mean the memory overflow was occurring before transcribing with whipser, just in the way you loaded and resampled the signal?

mcgreenwood commented 1 year ago

I don't know the exact reason, but adding this it disappeared. Maybe because there was no wav loaded, it buffered until memory overload? Maybe it would be good to check in the 'transcribe' function if the wav was loaded correctly.