Open behroozazarkhalili opened 4 months ago
What code are you using? Which GPU?
What code are you using? Which GPU?
Hi,
from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue
audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU")
device = "cuda" # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
asr_model="openai/whisper-large-v3",
diarizer_model="pyannote/speaker-diarization-3.1",
use_auth_token=False,
chunk_length_s=30,
device=device,
)
output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)
RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.
@kadirnar, any update on this?
I will test it today.
@kadirnar any news? got the same problem on rtx4090 in runpod
@kadirnar, Could you please kindly update us regarding this issue as it makes it impossible to use this great package?
I am sorry for my late answer. I am so busy.
@KolosDan @behroozazarkhalili
Have you done all the installations correctly? Have you used hf_token?
Have you done all the installations correctly?
If README installation section is correct, we good.
@kadirnar generally long inputs and long silence parts trigger this error, found example to trigger this error - https://youtu.be/SbZ5ONmdwgM?si=0QcdTMcamqyngqzU
For context: I download the input (an example video URL with -f 140
), convert it to MP3 using ffmpeg, and then send it to the worker. Then it does processing for 20 min and fails.
@kadirnar Yes, It is not related to the installation, though. I highlighted the root cause of the error in the code I already submitted.
Thank you for the detailed explanation. I'll try again this evening. I install Runpod gpu for each trial. So it's a little late. Today I will solve this problem. Thank you for your interest.
Hi. I just wanted to add that I also ended in the same issue.
In my case I have tested with both :
device = "cpu"
and
device = "mps"
The code I used is 100% the one of the example in the repo for Speaker Diarization. All the installation, requirements, Models, HF token and pyannote permissions are fine.
I cloned the repo just minutes ago.
What code are you using? Which GPU?
Hi,
from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU") device = "cuda" # cpu or mps pipeline = ASRDiarizationPipeline.from_pretrained( asr_model="openai/whisper-large-v3", diarizer_model="pyannote/speaker-diarization-3.1", use_auth_token=False, chunk_length_s=30, device=device, ) output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2) dialogue = format_speech_to_dialogue(output_text) print(dialogue)
RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.
I tested it with A40 on the Runpod platform and it works.
@kadirnar generally long inputs and long silence parts trigger this error, found example to trigger this error - https://youtu.be/SbZ5ONmdwgM?si=0QcdTMcamqyngqzU
could you test it with longer example?
How did you do the installation? Can you write them all? @IzzyHibbert
How did you do the installation? Can you write them all? @IzzyHibbert
Pls forget it. Was an issue with flash attention not properly installed. Thanks
@Terisback @behroozazarkhalili
Can you write your installation steps?
@kadirnar
FROM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
RUN apt-get update -y && apt-get install -y ffmpeg
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN pip install whisperplus git+https://github.com/huggingface/transformers
RUN pip3 install torch torchvision torchaudio
COPY handler.py handler.py
COPY start.sh start.sh
RUN chmod +x start.sh
CMD /start.sh
requirements.txt
runpod==1.6.2
pyannote.audio>=3.1.0
pyannote.core>=5.0.0
pyannote.database>=5.0.1
pyannote.metrics>=3.2.1
pyannote.pipeline>=3.0.1
speechbrain
huggingface_hub[cli]
moviepy>=1.0.3
yt_dlp
Requests>=2.31.0
accelerate
bitsandbytes
hqq
ffmpeg
ffmpeg-python
pre-commit
fire
start.sh
#!/usr/bin/env bash
set -e
huggingface-cli login --token $HF_TOKEN
# Use libtcmalloc for better memory management
TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)"
export LD_PRELOAD="${TCMALLOC}"
# Serve the API and don't shutdown the container
if [ "$SERVE_API_LOCALLY" == "true" ]; then
echo "Starting RunPod Handler"
python3 -u /handler.py --rp_serve_api --rp_api_host=0.0.0.0
else
echo "Starting RunPod Handler"
python3 -u /handler.py
fi
handler.py
import runpod
from runpod.serverless.utils import rp_download
from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
def handler(job):
job_input = job["input"]
file = rp_download.file(job_input["url"])
audio_path = file['file_path']
pipeline = ASRDiarizationPipeline.from_pretrained(
asr_model="openai/whisper-large-v3",
diarizer_model="pyannote/speaker-diarization-3.1",
use_auth_token=os.getenv('HF_TOKEN'),
chunk_length_s=30,
device="cuda",
)
diarization = pipeline(segment)
return diarization
if __name__ == "__main__":
runpod.serverless.start({"handler": handler})
@Terisback , I don't know the runpod code. Can you just run the sample code?
@Terisback @behroozazarkhalili
Can you write your installation steps?
What code are you using? Which GPU?
Hi,
from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU") device = "cuda" # cpu or mps pipeline = ASRDiarizationPipeline.from_pretrained( asr_model="openai/whisper-large-v3", diarizer_model="pyannote/speaker-diarization-3.1", use_auth_token=False, chunk_length_s=30, device=device, ) output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2) dialogue = format_speech_to_dialogue(output_text) print(dialogue)
RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.
I tested it with A40 on the Runpod platform and it works.
I also followed the installation steps exactly as mentioned in the repository's GitHub. I copied and pasted the sample code exactly as mentioned above, and tried to run it on RTX 3090, but I got the same error. I think the issue is related to the part of the code I quoted above. For now, I'm going to use another package since this one isn't working well.
@kadirnar I have received the following error when running the diarization pipeline:
I ran the diarization example exactly. The error seems related to the code below in the diarization pipeline.
The pipeline is also too slow, taking 10 minutes for a 5-minute audio file. Can you provide a detailed example of setting hqq for diarization?