grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints
Apache License 2.0
228 stars 24 forks source link

Question: How do we monitoring pod/processes GPU usage #20

Open Bpmm9012 opened 5 months ago

Bpmm9012 commented 5 months ago

Thank you for your dedication to developing a GPU memory oversubscription solution, which has immensely beneficial to our work.

I've conducted local tests involving various processes; however, the GPU utilization data obtained via nvidia-smi appears to be rather granular. Upon reviewing the README, I didn't discover a more refined monitoring approach, akin to Prometheus metrics.

Could you offer some suggestions for GPU usage by individual pods and processes?