Open ghost opened 1 year ago
Install nvidia-smi, since your are using an Nvidia RTX GC. It will help you to check the memory load.
Check that you have nvidia-smi : which nvidia-smi
If you get the path, you can then use watch -n 1 nvidia-smi
which will refresh the command output every 1s.
Start in an other terminal your command.
Model should load in memory. GC should exhale air like an elephant with a loading near 12GB. Also, it is useless to use your parameter --align_model WAV2VEC2_ASR_LARGE_LV60K_960H
as it is the default model. Check whisperx/alignment.py line 25.
I'm also using WhisperX with cuda. Here is my parameters : --model=large-v2 --device=cuda --device_index=0 --batch_size=16 --compute_type=float32
device and device_index are important. Reduce batch_size if memory is overloaded. Since you are using GPU, don't hesitate, try the best precision possible with float32.
If you want to check all of the parameters go here: whisperx/transcribe.py
But you should try before whisperx --help
.
Install nvidia-smi, since your are using an Nvidia RTX GC. It will help you to check the memory load. Check that you have nvidia-smi :
which nvidia-smi
If you get the path, you can then usewatch -n 1 nvidia-smi
which will refresh the command output every 1s.Start in an other terminal your command. Model should load in memory. GC should exhale air like an elephant with a loading near 12GB. Also, it is useless to use your parameter
--align_model WAV2VEC2_ASR_LARGE_LV60K_960H
as it is the default model. Check whisperx/alignment.py line 25.I'm also using WhisperX with cuda. Here is my parameters : --model=large-v2 --device=cuda --device_index=0 --batch_size=16 --compute_type=float32
device and device_index are important. Reduce batch_size if memory is overloaded. Since you are using GPU, don't hesitate, try the best precision possible with float32.
If you want to check all of the parameters go here: whisperx/transcribe.py But you should try before
whisperx --help
.
Thank you so much for your excellent explanation.
@davidlandais Is there a fix when the speech ends, but subs keep on going till the next speech?
I am not certain that I fully understand your request. Based on what you're telling me, here's what I imagine: You have a 2-minute video. During the first 10 seconds, a man (or woman) speaks. He/she speaks for 8 seconds. At the 10th second, someone responds. And you're wondering why, for 2 seconds after the first person has finished speaking, the subtitle continues to display. Honestly, I don't know. I don't have this problem; on the contrary, the subtitles disappear too quickly for my taste. If you have the ability to send me the audio file you're using (https://filebin.net/), I can try to run it through my system. Otherwise, I don't think it's much of a problem. It helps to continue and contribute to the understanding of the next subtitle line. Even if it stays for 2 seconds, at least you can make the mental connection between the previous subtitle and the new line. Looking forward to hearing from you.
I am not certain that I fully understand your request. Based on what you're telling me, here's what I imagine: You have a 2-minute video. During the first 10 seconds, a man (or woman) speaks. He/she speaks for 8 seconds. At the 10th second, someone responds. And you're wondering why, for 2 seconds after the first person has finished speaking, the subtitle continues to display. Honestly, I don't know. I don't have this problem; on the contrary, the subtitles disappear too quickly for my taste. If you have the ability to send me the audio file you're using (https://filebin.net/), I can try to run it through my system. Otherwise, I don't think it's much of a problem. It helps to continue and contribute to the understanding of the next subtitle line. Even if it stays for 2 seconds, at least you can make the mental connection between the previous subtitle and the new line. Looking forward to hearing from you.
I think what you've explained is the reason for this. It connects two sentences, and this usually happens when two speakers are speaking simultaneously. This is a really easy fix, and sometimes does not even need editing. Another easy to fix issue is, occasionally, subtitle shows up before the sound, like there is 5 seconds of subtitle, but speech starts at the 4th second.
However, sometimes there is a subtitle at one point of the video, but there is no speakers there, and this causes 4 to 5 subtitles to be wrongly placed, so I have to find the speeches to set them right. After finding the first speech, this is an easy to fix too.
I am guessing there is no preventing these, because these do not happen when speakers are speaking clear English and when there is no background noise to alter the words.
Hello to both of you, do you knowor have a method on how to transform the outputs json and dictionaires into a format understandable, such as SRT etc? I find the output too much
Hello, this is not an issue.
I am currently using this line to run whisperx: whisperx --model large-v2 --language en --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --batch_size 4 --output_format srt
I want to know if I can make it more accurate for better results. Any recommended line additions and their explanations please?
I have a 3080 10gb, I think I am using the GPU, but I do not know. If there is a line to make it run on GPU, please let me know.