pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.03k stars 758 forks source link

Why is pyannote not using my GPU ro CPU? So slow too. #1702

Open CrackerHax opened 5 months ago

CrackerHax commented 5 months ago

Tested versions

latest version 3.1

System information

windows 11, amd 5950x cpu, ubuntu 20.04 python 3.9 latest pyannote 3.1

Issue description

CPU at 10%, both Nvidia rtx3080s at 0%, I only see the model taking up GPU memory and it is very small. Using the sample code provided in the README

Minimal reproduction example (MRE)

Use the sample speaker diarization README code

diarization = pyannote.audio.Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="<my token>").to(torch.device("cuda"))

diarization_output = diarization(filename)

image image

freshpearYoon commented 5 months ago

Hi, I am facing the same issue. Did you solve the problem?

CrackerHax commented 5 months ago

Hi, I am facing the same issue. Did you solve the problem?

No I didn't. I just ended up letting it take forever. It eventually does its job without fully utilizing my GPU(s).

amas0 commented 4 months ago

Just wanted to chime in and say that I'm seeing similar-ish issues. I do see utilization on my GPU, so might not be the same thing, but general performance issues. I may have localized the issue to passing an audio file path to the pipeline directly.

TL;DR -- try preprocessing your audio into a waveform and running it that way. I dropped my processing from 50 seconds down to 12 seconds by doing so.

I had a 3 minute clip I used as a test here. Passing it into the pipeline as a path, e.g.


pipline = Pipeline.from_pretrained(...)
pipline('audio.mp3')

took about 50 seconds overall. Doing some profiling, I found that the code was spending a ton of time in the function pyannote.audio.core.io.Audio.crop.

Specifically this snippet here:

https://github.com/pyannote/pyannote-audio/blob/6a972c0c4e95de04637d7221208736c64c8b972a/pyannote/audio/core/io.py#L341-L354

where passing an audio file directly follows the else: block and spends a lot of time seemingly doing file I/O by loading the file for get_torchaudio_info. The docstring of that function claims that it should cache the output torchaudio.info but I couldn't quite grok where it was cacheing it. Manually implementing a cache there lowered by time to run down to 33 secs or so. So a big chunk, but not everything. I poked a bit more to see what was going on and it still seemed to have all the performance issues in that one block for the audio file processing.

Insteading of running it completely to ground since it seemed to mostly be a problem with audio file processing, I tried preprocessing via a torchaudio load as recommended on the HF page:

waveform, sample_rate = torchaudio.load("audio.wav")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

This resulted in my 6 min clip being diarized in 12 seconds and I seed solid GPU utilization along the way.

melMass commented 4 months ago

Probably not directly related but the way pyannote is currently packaged overwrites your environment torch and installs the CPU version instead... I spent way too long tracking it down to this project so that might be it...

You can quickly check with: python -c "import torch;print(torch.cuda.is_available())"

replic1111111 commented 2 months ago

you need to install the cuda libraries and reinstall pyannote. There are 2 seperate cuda related nvidia libraries that need to be installed before it properly detects cuda and uses gpu.