grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints
Apache License 2.0
181 stars 19 forks source link

Question: How do we monitoring pod/processes GPU usage #20

Open Bpmm9012 opened 3 weeks ago

Bpmm9012 commented 3 weeks ago

Thank you for your dedication to developing a GPU memory oversubscription solution, which has immensely beneficial to our work.

I've conducted local tests involving various processes; however, the GPU utilization data obtained via nvidia-smi appears to be rather granular. Upon reviewing the README, I didn't discover a more refined monitoring approach, akin to Prometheus metrics.

Could you offer some suggestions for GPU usage by individual pods and processes?

grgalex commented 2 weeks ago

What you are asking for is out of the scope of nvshare.

However, we could still work something out for your specific (commercial) case.

Reach out to info@nvshare.com, and we can discuss in private.