Closed leohuang2013 closed 6 months ago
@pengzhendong
I am looking to implement the speaker-diarization of pyannote with ONNX. I've been referring to this link: https://github.com/pengzhendong/pyannote-onnx. However, the repository linked doesn't seem to have the speaker-diarization output implemented.
I want to make the necessary adjustments myself, but pyannote's speaker-diarization operates by loading multiple models. Considering this, I'm unsure how to proceed with the modifications. I would appreciate it if you could provide me with advice or instructions on the specific steps or methods to follow.
@kfsky
Could you provide the link of but pyannote's speaker-diarization operates by loading multiple models
?
@pengzhendong I have been referring to this notebook: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb. When executing the following section of the notebook, multiple models get downloaded:
Copy code
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@develop", use_auth_token=True)
Therefore, I believe these multiple models are necessary for the conversion to ONNX. Is my understanding incorrect?
@kfsky There are two models:
The first one is used to segment the audio (pyannote-onnx does the same thing): The second one is used to get the embeddings of the segments.
@pengzhendong
The second one is used to get the embeddings of the segments.
Could you possibly share some ideas on the steps to follow when incorporating the second model into pyannote-onnx?
@kfsky Please refer this file: https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_diarization.py
@kfsky Did you manage to export the whole diarization pipeline to ONNX?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm also looking to convert pyannote model to onnx format and then use it from Rust with ort Did anyone manged to use it in c++?
Is that possible to convert the model to ONNX then use it in C++ for speaker diarization? Thanks.