pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.12k stars 764 forks source link

Is that possible to convert the model to ONNX then use it in C++ #1322

Closed leohuang2013 closed 6 months ago

leohuang2013 commented 1 year ago

Is that possible to convert the model to ONNX then use it in C++ for speaker diarization? Thanks.

github-actions[bot] commented 1 year ago

We found the following entry in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ. Otherwise, please give us a little time to review.

This is an automated reply, generated by FAQtory

pengzhendong commented 1 year ago

https://github.com/pengzhendong/pyannote-onnx

kfsky commented 1 year ago

@pengzhendong

I am looking to implement the speaker-diarization of pyannote with ONNX. I've been referring to this link: https://github.com/pengzhendong/pyannote-onnx. However, the repository linked doesn't seem to have the speaker-diarization output implemented.

I want to make the necessary adjustments myself, but pyannote's speaker-diarization operates by loading multiple models. Considering this, I'm unsure how to proceed with the modifications. I would appreciate it if you could provide me with advice or instructions on the specific steps or methods to follow.

pengzhendong commented 1 year ago

@kfsky Could you provide the link of but pyannote's speaker-diarization operates by loading multiple models?

kfsky commented 1 year ago

@pengzhendong I have been referring to this notebook: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb. When executing the following section of the notebook, multiple models get downloaded:

Copy code
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@develop", use_auth_token=True)

Therefore, I believe these multiple models are necessary for the conversion to ONNX. Is my understanding incorrect?

pengzhendong commented 1 year ago

@kfsky There are two models:

  1. https://huggingface.co/pyannote/segmentation
  2. https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb

The first one is used to segment the audio (pyannote-onnx does the same thing): image The second one is used to get the embeddings of the segments.

kfsky commented 1 year ago

@pengzhendong

The second one is used to get the embeddings of the segments.

Could you possibly share some ideas on the steps to follow when incorporating the second model into pyannote-onnx?

pengzhendong commented 1 year ago

@kfsky Please refer this file: https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_diarization.py

mark95 commented 1 year ago

@kfsky Did you manage to export the whole diarization pipeline to ONNX?

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thewh1teagle commented 3 months ago

I'm also looking to convert pyannote model to onnx format and then use it from Rust with ort Did anyone manged to use it in c++?