Open chubin opened 5 months ago
Would you mind sharing a link to a Google Colab that one can just click and run to reproduce the issue?
Unfortunately, I have no access to Google Colab from my Google Account (I can create a new account if needed), but as you can see the code is trivial.
I noticed that the problem disappears, when I load the audio file using Audio
:
from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io("audio.mp3")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})
instead of loading audio.wav
directly. The wav file (audio.wav
) has the same sample rate (16000) though.
The code might be "trivial" but the whole point of sharing a Google Colab is for pyannote maintainers to avoid wasting time on problems that are not reproducible.
For instance, two files with two different extensions (.wav and .mp3) are mentioned here. It is not clear which one works and which one fails.
Preparing a Google Colab will definitely increase your chances of having someone look at your issue. It might also happen that the mere preparation of the Google Colab makes you realize that the problem is on your side (I am not saying that this is the case here but it happened in the past).
+1 for this issue
thanks for the note @chubin , I have used your solution with
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io("audio.mp3")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})
and got much faster inference 👍
Unfortunately, I have no access to Google Colab from my Google Account (I can create a new account if needed), but as you can see the code is trivial.
I noticed that the problem disappears, when I load the audio file using
Audio
:from pyannote.audio import Audio io = Audio(mono='downmix', sample_rate=16000) waveform, sample_rate = io("audio.mp3") diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})
instead of loading
audio.wav
directly. The wav file (audio.wav
) has the same sample rate (16000) though.
Wow, after updatin from 2.x to 3.x I had performance issues. Now It's better than old code. I really didn't get what caused that but..
Thanks
Tested versions
System information
Ubuntu 22.04, NVIDIA RTX A6000
Issue description
I am not sure if it is a bug, so please feel free to close it if it is expected behavior.
I am trying to diarize a large recording (approximately 60 minutes), and the diarization process takes 8.5 minutes:
Here is my code:
It uses the GPU during diarization, but with a low utilization level (~10%), and it uses 1 core of the CPU (100%) all the time.
When doing the diarization with
whisperx
, though, it takes just a minute, and GPU utilization is at full capacity.However, the quality of diarization is slightly worse in this case (approximately 5% of text is attributed to wrong/non-existent speakers).
Pyannote diarization quality is just brilliant, but it takes an order of magnitude more time.
I suppose that I am doing something wrong, but I don't know what exactly.
Could you please point me in the right direction, or just say that it is exactly as it should be, and the behavior is expected.
GPU utilization while using pyannote pure
GPU utilization when using whisperX
Minimal reproduction example (MRE)
(not applicable)