Question: How do we monitoring pod/processes GPU usage

Thank you for your dedication to developing a GPU memory oversubscription solution, which has immensely beneficial to our work.

I've conducted local tests involving various processes; however, the GPU utilization data obtained via nvidia-smi appears to be rather granular. Upon reviewing the README, I didn't discover a more refined monitoring approach, akin to Prometheus metrics.

Could you offer some suggestions for GPU usage by individual pods and processes?

grgalex / nvshare

Question: How do we monitoring pod/processes GPU usage #20