Open Lixi20 opened 3 months ago
It is not clear where the problem really is, maybe you could fix the formatting...
If you mean the pipeline segments are wrong/misplaced, it might be due to lots of factors that makes it very hard for the pretrained pipeline to perform well out-of-the-box : noisy audio, specific acoustic conditions that were not seen when the model was trained, etc You might want to finetune the model on the type of data you target (and take a look at the available tutorial notebooks).
Tested versions
pyannote.audio = 3.3.1
System information
ubuntu
Issue description
from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.1", use_auth_token="hf_KkqHxRTGcaXXXXXXXsZvlMCDgAmBuSGCmXE")
import torch pipeline.to(torch.device("cuda"))
diarization = pipeline("/root/Audio/Test.mp3")
for turn, _, speaker in diarization.itertracks(yieldlabel=True): print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker{speaker}")
start=0.6s stop=2.2s speaker_SPEAKER_00 start=3.5s stop=4.0s speaker_SPEAKER_00
start=0.6s stop=2.2s -> 00:00:00,600 --> 00:00:02,200 start=3.5s stop=4.0s -> 00:00:03,500 --> 00:00:04,000 The timeline is wrong
The right time is: 00:00:02,600 --> 00:00:04,486 00:00:05,439 --> 00:00:06,013
please help me!!!