GPU Memory Leak when Concurrent Process

zanjabil2502 commented 1 year ago

Firstly, i say thank you for updating pyannote-audio, specially for model v3.0. But i want report a something what i found. I know if the model of speaker embedding is new model and you choice train the model on onnx-runtime to make it lighter than use torch model. But there a bug in onnx-runtime, specially when using GPU. So when i using GPU and i run concurrent process, GPU cant decrease when the model on idle time until i stop my program.

I read some forum from onnx-runtime repo and many user have same problem in onnx-runtime. May you have solution for this problem?

github-actions[bot] commented 1 year ago

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin commented 1 year ago

I am afraid I won't be able to help you as I have very litte experience with actual deployment and even less concurrent processing.

📣 To people that have successfully deployed pyannote pipelines in production (and version 3.0 in particular), now would be the right time to chime in and help @zanjabil2502

zanjabil2502 commented 1 year ago

Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025.

I add this code on speaker_verification.py (def WeSpeakerPretrainedSpeakerEmbedding)

sess_options.enable_mem_pattern     = False
sess_options.enable_cpu_mem_arena   = False
sess_options.enable_mem_reuse       = False
self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers)
self.session_.disable_fallback()

From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime.

And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1

hbredin commented 1 year ago

FYI: https://github.com/pyannote/pyannote-audio/issues/1537

hbredin commented 1 year ago

Closing as latest version no longer relies on ONNX runtime. Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

Saccarab commented 6 days ago

Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025.

I add this code on speaker_verification.py (def WeSpeakerPretrainedSpeakerEmbedding)
sess_options.enable_mem_pattern     = False
sess_options.enable_cpu_mem_arena   = False
sess_options.enable_mem_reuse       = False
self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers)
self.session_.disable_fallback()
From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime.

And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1

apart from addressing the cap usage, were you able to actually ever resolve the underlying GPU memory leak? with pyannote.audio 3.3.2 and speaker-diarization-3.1 I'm still experiencing this issue on GPU with CUDA support

pyannote / pyannote-audio

GPU Memory Leak when Concurrent Process #1510