Closed zanjabil2502 closed 1 year ago
Thank you for your issue.You might want to check the FAQ if you haven't done so already.
Feel free to close this issue if you found an answer in the FAQ.
If your issue is a feature request, please read this first and update your request accordingly, if needed.
If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).
Companies relying on pyannote.audio
in production may contact me via email regarding:
This is an automated reply, generated by FAQtory
I am afraid I won't be able to help you as I have very litte experience with actual deployment and even less concurrent processing.
📣 To people that have successfully deployed pyannote pipelines in production (and version 3.0 in particular), now would be the right time to chime in and help @zanjabil2502
Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025.
I add this code on speaker_verification.py (def WeSpeakerPretrainedSpeakerEmbedding
)
sess_options.enable_mem_pattern = False
sess_options.enable_cpu_mem_arena = False
sess_options.enable_mem_reuse = False
self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers)
self.session_.disable_fallback()
From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime.
And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1
Closing as latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1
and pyannote/speaker-diarization-3.1
(and open new issues if needed).
Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025.
I add this code on speaker_verification.py (def
WeSpeakerPretrainedSpeakerEmbedding
)sess_options.enable_mem_pattern = False sess_options.enable_cpu_mem_arena = False sess_options.enable_mem_reuse = False self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers) self.session_.disable_fallback()
From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime.
And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1
apart from addressing the cap usage, were you able to actually ever resolve the underlying GPU memory leak? with pyannote.audio 3.3.2 and speaker-diarization-3.1 I'm still experiencing this issue on GPU with CUDA support
Firstly, i say thank you for updating pyannote-audio, specially for model v3.0. But i want report a something what i found. I know if the model of speaker embedding is new model and you choice train the model on onnx-runtime to make it lighter than use torch model. But there a bug in onnx-runtime, specially when using GPU. So when i using GPU and i run concurrent process, GPU cant decrease when the model on idle time until i stop my program.
I read some forum from onnx-runtime repo and many user have same problem in onnx-runtime. May you have solution for this problem?