Closed kanjieater closed 1 year ago
You can try to lower the --refine_ts_num
(default: 100). Or just disable refinement with --refine_ts_num 0
.
You can try to lower the
--refine_ts_num
(default: 100). Or just disable refinement with--refine_ts_num 0
.
Thanks - I'll give it a try. Could you explain more about how that parameter affects the model so I can tune it accurately? If I disable it with 0, what will be the impact?
So it seems refine_ts_num
doesn't have a significant effect on memory usage. But there does appear to be a surge in memory usage when loading the model with default whisper function. This surge elevates the baseline memory usage. This surge should be fixed in 0b423391e115abcb8b8fdbb581b75f5b1fc746d3. added --sync_empty
which can also reduce memory usage during inference.
Thank you for the quick response. I tried your suggestion and latest version. Unfortunately, there was no change, as the memory still filled up quickly
8737 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --sync_empty
The memory starts lower for a time, then around that peak it crashes, it's not an immediate crash but it is within a 3 minutes.
My apologies, I misread the issue. I was assuming we were talking about GPU memory. The previous solution only works for GPU memory.
It is expected that stable-ts has higher CPU memory usage than official whisper and other implementations because it stores significantly more data (in RAM) for stabilizing the timestamps. The spike and crash you're seeing might be due to the stable-ts trying to generate a timestamp mask for your the entire audio track at once. So this spike is likely before inference (--verbose
should tell you if there is not text output to the console before it crashes). If this is the case, --suppress_silence False
should drastically lower the RAM usage.
I didn't see any output when running with the --verbose
command.
19625 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --sync_empty --verbose
I will try removing the sync_empty flag, and running again to see if verbose shows anything (accidentally left it in. I'll try running with the --suppress_silence False
as well.
Update:
Verbose didn't output anything unfortunately
20378 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --verbose
I also ran it with suppress_silence, and got the same result 22053 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --suppress_silence false --overwrite
Memory usage and CPU usage spike at the same time when the Out of Memory error occurs.
Just to be clear, my specs are: i9-13900ks 4070TI 32GB DDR5 ram
All of this is stable and working well. It runs inside of WSL2 on Win11 (which has access to CPU, GPU and RAM - works fine for whisper and whisperx as far as resources). I've allocated additional memory as well:
Would you like me to send you the 1GB file somewhere so you could see if you can reproduce as well? I can run it successfully for smaller files.
I just started a run on a 6 hour wav
file that is 700mb. The progress bar started very quickly. The progress bar never showed for my 19hr 1GB file and always crashed.
Update: The 6 hour wav completed w/o issue.
If you still see a spike even with --suppress_silence false
. Then the spike is likely from whisper.log_mel_spectrogram
which the default part of whisper loading the audio. Passing a 19hr long array into whisper.log_mel_spectrogram
causes 23GB spike on my end. I suggest splitting that audio track down to shorter tracks.
import whisper
mel = whisper.log_mel_spectrogram('audio.mp3')
You are correct. The input file is too large when Whisper starts, so I either need more RAM or for Whisper to fix it upstream. Thank you for your help with this.
19 hour file around 1GB in size results in killed for OOM error. I'm running with 13GB available.
It happens when I run with this command. It works fine for a smaller input mp3 & whisper and whisperX both manage to run this without OOM errors.
stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass"
Is there any fixes that could be or workarounds available? I'm guessing I could use a less accurate model (though I was hoping not to).
Update: I also tried it with 20GB available & --model medium set. It resulted in the same thing