pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.2k stars 766 forks source link

The timeline is wrong #1737

Open Lixi20 opened 3 months ago

Lixi20 commented 3 months ago

Tested versions

pyannote.audio = 3.3.1

System information

ubuntu

Issue description

from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.1", use_auth_token="hf_KkqHxRTGcaXXXXXXXsZvlMCDgAmBuSGCmXE")

import torch pipeline.to(torch.device("cuda"))

diarization = pipeline("/root/Audio/Test.mp3")

for turn, _, speaker in diarization.itertracks(yieldlabel=True): print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker{speaker}")

start=0.6s stop=2.2s speaker_SPEAKER_00 start=3.5s stop=4.0s speaker_SPEAKER_00

start=0.6s stop=2.2s -> 00:00:00,600 --> 00:00:02,200 start=3.5s stop=4.0s -> 00:00:03,500 --> 00:00:04,000 The timeline is wrong

The right time is: 00:00:02,600 --> 00:00:04,486 00:00:05,439 --> 00:00:06,013

please help me!!!

FrenchKrab commented 3 months ago

It is not clear where the problem really is, maybe you could fix the formatting...

If you mean the pipeline segments are wrong/misplaced, it might be due to lots of factors that makes it very hard for the pretrained pipeline to perform well out-of-the-box : noisy audio, specific acoustic conditions that were not seen when the model was trained, etc You might want to finetune the model on the type of data you target (and take a look at the available tutorial notebooks).