gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics
BSD 3-Clause "New" or "Revised" License
205 stars 31 forks source link

Higher-level application #45

Open pentschev opened 1 year ago

pentschev commented 1 year ago

PyNVML bindings are great to do all GPU information management from Python, but they are almost entirely an identical a copy of the C API. This can be a barrier for Python users who need to find out from the NVML API documentation what the API provides, and then what are the appropriate types that need to be passed, etc. We currently utilize PyNVML in both Distributed and Dask-CUDA, but there's also some overlap that leads to code duplication.

I feel one way to reduce code duplication and make it easier for new users, and thus make things overall better, is to provide a "High-level PyNVML library" that takes care of the basic needs for users. For example, I would imagine something like the following (but not limited to) to be available (implementation omitted for simplicity):

class Handle:
    """A handle to a GPU device.

    Parameters
    ----------
    index: int, optional
        Integer representing the CUDA device index to get a handle to.
    uuid: bytes or str, optional
        UUID of a CUDA device to get a handle to.

    Raises
    ------
    ValueError
        If neither `index` nor `uuid` are specified or if both are specified.
    """
    def __init__(
        self, index: Optional[int] = None, uuid: Optional[Union[bytes, str]] = None
    )

    @property
    def free_memory(self) -> int:
        """
        Free memory of the CUDA device.
        """

    @property
    def total_memory(self) -> int:
        """
        Total memory of the CUDA device.
        """

    @property
    def used_memory(self) -> int:
        """
        Used memory of the CUDA device.
        """

There would be more than the above to be covered, such as getting the number of available GPUs in the system, whether a GPU has a context currently created, if a handle is MIG or physical GPU, etc. Additionally, we would have simple tools that are generally useful, for example a small tool I wrote long ago to measure NVLink bandwidth and peak memory, and whatever else fits in the scope of a "High-level PyNVML library" that can make our users' lives easier.

So to begin this discussion I would like to know how people like @rjzamora and @kenhester feel about this idea. Would this be something that would fit in the scope of this project? Are there any impediments to adding such a library within the scope of this project/repository?

Also cc @quasiben for vis.

rjzamora commented 1 year ago

I strongly agree that it would be valuable to have a "higher-level" (pythonic) API for users to interact with. One that users can install without a GPU- or CUDA-enabled system.

A few years ago, we resurrected pynvml to make it easier for RAPIDS/python users to query basic system information. This project is no longer actively maintained, because the underlying NVML bindings are now directly copied from nvidia-ml-py (which was stale back in 2019, but is now regularly updated by the official NVML team). At this point, the only difference between pynvml and nvidia-ml-py is that pynvml still includes @KenHester’s smi module.


The fate of pynvml has been in limbo for a while now, and so it probably makes sense to figure out if the long-term plan is to officially archive the project in favor of nvidia-ml-py. If the plan is to archive this project, it probably makes more sense to attack the high-level API in a new project (perhaps one that can include an smi module and nvdashboard).

walternat1ve commented 1 year ago

is this project on ice? given its not in par with nvidia-ml-py why is not everything available here for smi module tat i get from nvidia-smi?

pentschev commented 1 year ago

nvidia-ml-py is now the more up-to-date bindings and is maintained by the same team that maintains the NVML library, thus it’s the preferred method to access NVML from Python. The PyNVML project was created in the past to fill the gap of Python support for NVML, before nvidia-ml-py existed, and PyNVML is still here to provide legacy compatibility.

Note that both PyNVML and nvidia-ml-py are wrappers for the NVML library and not nvidia-smi, and although I think they provide all the NVIDIA-specific tooling that is used by nvidia-smi, there are no guarantees.

kenhester commented 1 year ago

If it would be helpful, I can update the pynvml. I was waiting for the cuda 12 drop of nvidia-ml-py to update pynvml.

jakirkham commented 1 year ago

Might also be worth updating the nvidia-ml-py PyPI project description. Seeing mentions of Python 2.5, which (I don't think) are relevant any more. This is Python 3+ now right?