m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.61k stars 1.33k forks source link

whisperx.DiarizationPipeline load long time #924

Open smallpize opened 6 days ago

smallpize commented 6 days ago

Recently, I used whisperx.DiarizationPipeline(use_auth_token=hf_token, device='cuda') , and it took a long time to load, longer than you can imagine. At the same time, I used the speaker-diarization-3.1 example on huggingface: Pipeline.from_pretrained("pyannote/speaker-diarization-3.1",use_auth_token="TOKEN"), and found the same problem, 8 cores cpu use 800%, this is the problem I ran on the RTX4090 server, the CPU uses AuthenticAMD. I checked related issues and didn't know if it was a CPU compatibility issue or a pyannote version issue. Finally, after upgrading pyannote.audio==3.1.1 to pyannote.audio==3.3.2, the loading time became normal, but it still felt longer than the previous test. The CPU usage was still very high, but the previous warning reports were almost gone.