kadirnar / whisper-plus

WhisperPlus: Faster, Smarter, and More Capable 🚀
Apache License 2.0
1.73k stars 137 forks source link

ValueError: attempt to get argmin of an empty sequence in Diarization! #116

Open behroozazarkhalili opened 4 months ago

behroozazarkhalili commented 4 months ago

@kadirnar I have received the following error when running the diarization pipeline:

ValueError: attempt to get argmin of an empty sequence

I ran the diarization example exactly. The error seems related to the code below in the diarization pipeline.

 # align the diarizer timestamps and the ASR timestamps
for segment in new_segments:
    # get the diarizer end timestamp
    end_time = segment["segment"]["end"]
    # find the ASR end timestamp that is closest to the diarizer's end timestamp and cut the transcript to here
    upto_idx = np.argmin(np.abs(end_timestamps - end_time))

The pipeline is also too slow, taking 10 minutes for a 5-minute audio file. Can you provide a detailed example of setting hqq for diarization?

kadirnar commented 4 months ago

What code are you using? Which GPU?

behroozazarkhalili commented 4 months ago

What code are you using? Which GPU?

Hi,

from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue

audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization-3.1",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.

behroozazarkhalili commented 4 months ago

@kadirnar, any update on this?

kadirnar commented 4 months ago

I will test it today.

KolosDan commented 4 months ago

@kadirnar any news? got the same problem on rtx4090 in runpod

behroozazarkhalili commented 4 months ago

@kadirnar, Could you please kindly update us regarding this issue as it makes it impossible to use this great package?

kadirnar commented 4 months ago

I am sorry for my late answer. I am so busy.

@KolosDan @behroozazarkhalili

image

Have you done all the installations correctly? Have you used hf_token?

Terisback commented 4 months ago

Have you done all the installations correctly?

If README installation section is correct, we good.

@kadirnar generally long inputs and long silence parts trigger this error, found example to trigger this error - https://youtu.be/SbZ5ONmdwgM?si=0QcdTMcamqyngqzU

For context: I download the input (an example video URL with -f 140), convert it to MP3 using ffmpeg, and then send it to the worker. Then it does processing for 20 min and fails.

Error content ```log {"error_type": "", "error_message": "attempt to get argmin of an empty sequence", "error_traceback": "Traceback (most recent call last):\ File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\ handler_return = handler(job)\ File \"/handler.py\", line 22, in handler\ diarization = pipeline(audio_path)\ File \"/usr/local/lib/python3.10/dist-packages/whisperplus/pipelines/whisper_diarize.py\", line 171, in __call__\ upto_idx = np.argmin(np.abs(end_timestamps - end_time))\ File \"/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py\", line 1325, in argmin\ return _wrapfunc(a, 'argmin', axis=axis, out=out, **kwds)\ File \"/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py\", line 59, in _wrapfunc\ return bound(*args, **kwds)\ ValueError: attempt to get argmin of an empty sequence\ ", "hostname": "xxx", "worker_id": "xxx", "runpod_version": "1.6.2"} ```
Source code of runpod worker ```python import runpod from runpod.serverless.utils import rp_download from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue import os def handler(job): job_input = job["input"] file = rp_download.file(job_input["url"]) audio_path = file['file_path'] pipeline = ASRDiarizationPipeline.from_pretrained( asr_model="openai/whisper-large-v3", diarizer_model="pyannote/speaker-diarization-3.1", use_auth_token=os.getenv('HF_TOKEN'), chunk_length_s=30, device="cuda", ) diarization = pipeline(audio_path) return diarization if __name__ == "__main__": runpod.serverless.start({"handler": handler}) ```
behroozazarkhalili commented 4 months ago

@kadirnar Yes, It is not related to the installation, though. I highlighted the root cause of the error in the code I already submitted.

kadirnar commented 4 months ago

Thank you for the detailed explanation. I'll try again this evening. I install Runpod gpu for each trial. So it's a little late. Today I will solve this problem. Thank you for your interest.

IzzyHibbert commented 4 months ago

Hi. I just wanted to add that I also ended in the same issue.

In my case I have tested with both : device = "cpu" and device = "mps"

The code I used is 100% the one of the example in the repo for Speaker Diarization. All the installation, requirements, Models, HF token and pyannote permissions are fine.

I cloned the repo just minutes ago.

Screenshot 2024-07-23 alle 10 02 51

kadirnar commented 4 months ago

What code are you using? Which GPU?

Hi,

from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue

audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization-3.1",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.

I tested it with A40 on the Runpod platform and it works.

image
Terisback commented 4 months ago

@kadirnar generally long inputs and long silence parts trigger this error, found example to trigger this error - https://youtu.be/SbZ5ONmdwgM?si=0QcdTMcamqyngqzU

could you test it with longer example?

kadirnar commented 4 months ago

How did you do the installation? Can you write them all? @IzzyHibbert

IzzyHibbert commented 4 months ago

How did you do the installation? Can you write them all? @IzzyHibbert

Pls forget it. Was an issue with flash attention not properly installed. Thanks

kadirnar commented 4 months ago

@Terisback @behroozazarkhalili

Can you write your installation steps?

Terisback commented 4 months ago

@kadirnar

FROM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

RUN apt-get update -y && apt-get install -y ffmpeg

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN pip install whisperplus git+https://github.com/huggingface/transformers
RUN pip3 install torch torchvision torchaudio

COPY handler.py handler.py
COPY start.sh start.sh
RUN chmod +x start.sh

CMD /start.sh

requirements.txt

runpod==1.6.2
pyannote.audio>=3.1.0
pyannote.core>=5.0.0
pyannote.database>=5.0.1
pyannote.metrics>=3.2.1
pyannote.pipeline>=3.0.1
speechbrain
huggingface_hub[cli]
moviepy>=1.0.3
yt_dlp
Requests>=2.31.0
accelerate
bitsandbytes
hqq
ffmpeg
ffmpeg-python
pre-commit
fire

start.sh

#!/usr/bin/env bash

set -e

huggingface-cli login --token $HF_TOKEN

# Use libtcmalloc for better memory management
TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)"
export LD_PRELOAD="${TCMALLOC}"

# Serve the API and don't shutdown the container
if [ "$SERVE_API_LOCALLY" == "true" ]; then
    echo "Starting RunPod Handler"
    python3 -u /handler.py --rp_serve_api --rp_api_host=0.0.0.0
else
    echo "Starting RunPod Handler"
    python3 -u /handler.py
fi

handler.py

import runpod
from runpod.serverless.utils import rp_download
from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline

def handler(job):
    job_input = job["input"]
    file = rp_download.file(job_input["url"])

    audio_path = file['file_path']

    pipeline = ASRDiarizationPipeline.from_pretrained(
        asr_model="openai/whisper-large-v3",
        diarizer_model="pyannote/speaker-diarization-3.1",
        use_auth_token=os.getenv('HF_TOKEN'),
        chunk_length_s=30,
        device="cuda",
    )

    diarization = pipeline(segment)
    return diarization

if __name__ == "__main__":
    runpod.serverless.start({"handler": handler})
kadirnar commented 4 months ago

@Terisback , I don't know the runpod code. Can you just run the sample code?

behroozazarkhalili commented 4 months ago

@Terisback @behroozazarkhalili

Can you write your installation steps?

What code are you using? Which GPU?

Hi,

from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue

audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=6sUwRiIncKU")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization-3.1",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

RTX 3090; however, it is not related to the GPU. It seems that the returned result is empty in some places.

I tested it with A40 on the Runpod platform and it works. image

I also followed the installation steps exactly as mentioned in the repository's GitHub. I copied and pasted the sample code exactly as mentioned above, and tried to run it on RTX 3090, but I got the same error. I think the issue is related to the part of the code I quoted above. For now, I'm going to use another package since this one isn't working well.