gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics
BSD 3-Clause "New" or "Revised" License
205 stars 31 forks source link

MIG Support #30

Closed PidgeyBE closed 2 years ago

PidgeyBE commented 3 years ago

Dear

Pynvml is not in sync anymore with nvml. More specifically it is not possible to access the MIG APIs: https://docs.nvidia.com/deploy/nvml-api/group__nvmlMultiInstanceGPU.html#group__nvmlMultiInstanceGPU_1g15e07cc6230a2d90c5bc85de85261ef7

Would it be possible to add these?

BR, Pieterjan

quasiben commented 3 years ago

It's something we should add. I don't know how soon this can get done. @kenhester do you have any time to work on this ? If not, no worries.

kenhester commented 3 years ago

I am working on sync'ing pynvml with nvml. I have sync'd the headers up to CUDA 11. I will work to timely implement the API changes, enhancements, and updates.

kenhester commented 3 years ago

To answer directly, MIG support is a target feature to implement.

kenhester commented 3 years ago

MIG functionality is being added to my branch. There are more functions to be implemented, but comments and issues welcome.

zronaghi commented 3 years ago

cc @drobison00

rjzamora commented 3 years ago

Note that #32 includes some MIG support. However, I do not personally have the proper resources for testing.

zronaghi commented 3 years ago

Thanks @rjzamora, we can test once it's ready.

kenhester commented 3 years ago

The bindings are updated. Please confirm it addresses your issue.

rjzamora commented 3 years ago

The LocalCUDACluster bug will probably require a Dask-CUDA fix (since I believe MIG requires a slightly different NVML API)

pentschev commented 3 years ago

MIG support is in, so I think this can be closed. The LocalCUDACluster issue is being discussed in https://github.com/rapidsai/dask-cuda/issues/583 , and it seems that all NVML features required for that are available in latest PyNVML releases.