Closed mcgreenwood closed 1 year ago
This sounds incredible (memory overflow with 20 sec audio and tiny model)!
I have no idea how that's possible.
Can you share the audio and the command that you launch?
It happens with all files I try to transcribe. They have 22050 Khz sample rate. I already tried different sample rates to reduce the memory usage.
Here's the python code I use.
import os
import json
import sys
import numpy as np
from scipy.io import wavfile
from scipy.signal import resample
import whisper_timestamped as whisper
import time
sys.path.append('whisper-timestamped/')
file_name = "TXMIkzyF9xo"
json_path = f"output/transcriptions/json/{file_name}.json"
sample_rate, audio = wavfile.read(f"output/raw/{file_name}.wav")
audio = np.array(audio, dtype=np.float32) / 2**15
audio_duration = len(audio) / sample_rate
print(f"Duration of the audio file: {audio_duration} seconds")
# Downsample the audio to 8000 Hz
target_sample_rate = 8000
audio = resample(audio, int(len(audio) * target_sample_rate / sample_rate))
sample_rate = target_sample_rate
transcription_loaded = False
if os.path.exists(json_path) and os.path.getsize(json_path) > 0:
try:
# Load the transcription from the existing JSON file
with open(json_path, "r") as f:
transcription = json.load(f)
transcription_loaded = True
except json.JSONDecodeError:
# JSON file is malformed, proceed to transcribe the audio
pass
if not transcription_loaded:
# model = whisper.load_model("NbAiLab/whisper-large-v2-nob", device="cpu")
model = whisper.load_model("tiny", device="cpu")
result = whisper.transcribe(model, audio, language="de")
transcription = result
# Save the JSON file
with open(json_path, "w") as f:
json.dump(transcription, f, indent=2)
Ah... I must have accidentally dropped that line of code.
audio = whisper.load_audio(f'output/raw/{file_name}.wav')
Sorry for bothering....
Works perfect now. Great work by the way!!
Ah so you mean the memory overflow was occurring before transcribing with whipser, just in the way you loaded and resampled the signal?
I don't know the exact reason, but adding this it disappeared. Maybe because there was no wav loaded, it buffered until memory overload? Maybe it would be good to check in the 'transcribe' function if the wav was loaded correctly.
I have a 20 sec. audiofile, but while trying to transcribe it with the tiny model, the Python Kernel Memory rises up to 128GB and got killed.
Any Ideas what can cause this problem?
Btw. It worked with one file that was longer, but none of the other files.