pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.29k stars 777 forks source link

Inference time taking too long - Speaker diarization #876

Closed pedrohenriqp closed 2 years ago

pedrohenriqp commented 2 years ago

Hi guys, I have a question about inference time using pipeline for Speaker Diarization. I I trained a model with my own data, and I am using pipeline to make predictions in a new .wav file. A 12 minute long conversation audio is taking about 14 minutes for inference.

Is that right? If yes, do you guys have any tips to speed up the pipeline inference time?

I am using this kind of code snippet to run inference. Is there a better way to run inferences on many files without needing to use for loops?

demofile_path = "audio/file_example.wav"
DEMO_FILE = {'uri': 'file_example', 'audio': demofile_path}
pipeline = SpeakerDiarization(sad_scores = "models/speech_activity_detection/train/SpeakerDiarization.MixHeadset.train/weights/0022.pt",
                              scd_scores = "models/speaker_change_detection/train/SpeakerDiarization.MixHeadset.train/weights/0001.pt",
                              embedding = "models/speaker_embedding/train/SpeakerDiarization.MixHeadset.train/weights/0010.pt", 
                              method= "affinity_propagation")

pipeline.load_params("pipelines/speaker_diarization/train/SpeakerDiarization.MixHeadset.development/params.yml")

diarization = pipeline(DEMO_FILE)
pip freeze | grep pyannote

pyannote.audio==1.1.1
pyannote.core==4.3      
pyannote.database==4.1.1
pyannote.metrics==3.2   
pyannote.pipeline==1.5.2

Thanks

Yagna24 commented 2 years ago

Yes @pedrohenriqp , on colab gpu for just running diarization = pipeline(audio.wav) it takes more than 9 minutes. Please let me know if u come across any solution.

ColorBuffer commented 2 years ago

Because this repo is written by a group of noobies programmers (just elementary AI students). There is tons of useless polymorphism implementations, complicated class inheritance that makes debugging so hard.

Any way I spend a week and Refactored this repo and made a "ahead of real time" version that runs fully on gpu.

I actually wonder how they call the module "Pipeline" while it's neither modular or realtime. most of the code is written in numpy, and is a good case of how to use speech brain in your project as an example and nothing more.

lutharsanen commented 2 years ago

@WilledgeR, can your "ahead of real time" version be found somewhere? I run also into trouble, that the speaker diarization does not use the available gpu ressources.

best,

Lutharsanen

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.