Closed xmontero closed 1 week ago
You need to specify a hugging face access token in the parameters of the prediction. The hugging face account the access token is attached to must have access to https://huggingface.co/pyannote/segmentation-3.0 and https://huggingface.co/pyannote/speaker-diarization-3.1
Hi! I'm using replicate to try out the diarization.
I'm using a file of 1'15" to test. It's an mp3 converted from an AMR file from a phone voice call recording.
When running without language (to force detection), all the default parameters, and alignment and diarization set to false, it works.
The file is in catalan ("ca") and it properly detects it.
When running it setting alignment + diarization to true, it breaks. I get this output:
Logs:
No language specified, language will be first be detected for each audio file (increases inference time). Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run
python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x. Detected language: ca (0.92) in first 30s of audio... Could not download 'pyannote/speaker-diarization-3.1' pipeline. It might be because the pipeline is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:It breaks both in victor-upmeet/whisperx and victor-upmeet/whisperx-a40-large
What do I have to do to use diarization?