pyNVML won't work on a Jetson, is there a workaround

JasonAtNvidia commented 4 years ago

There is no NVML library on aarch64 NVIDIA Jetson That will break many libraries relying on this library, such as cuxfilter. The geospatial and cuxfilter libraries are among the most requested for Jetson and I'd love to make it work. Is there a way to use Numba functions to replace pyNVML in this library?

quasiben commented 4 years ago

So we use pynvml in two places:

1) getting the number of GPUs in machine -- this is easy to do 2) getting the CPU affinity for GPUs -- this will be challenging to replace. Do Jetson devices have more than one GPU ?

JasonAtNvidia commented 4 years ago

The probability of a Jetson with a discrete GPU is ultra low and we can say that they don't exist outside of NVIDIA DRIVE units. We could easily wrap the affinity functionality in a statement such as if "tegra" in platform.uname().release and that would indicate a Jetson device.

jakirkham commented 4 years ago

It might be possible to detect affinity through hwloc.

quasiben commented 4 years ago

Alternatively, if there is only one GPU on a jetson, does device affinity do anything ?

JasonAtNvidia commented 4 years ago

Theoretically there is no device affinity on a Jetson, GPU and CPU share the same chunk of RAM and don't have to communicate via PCI bus.

pentschev commented 4 years ago

Do any of the Jetson board have multiple GPUs @JasonAtNvidia ? Note that dask-cuda is targeting a one-process-per-GPU model for parallelism, and if none of the boards have multiple GPUs you may not have a lot of use for dask-cuda anyway.

If there are multiple GPU Jetsons, is there a reliable way to query whether the system is running on a Jetson? We can certainly add some conditions and work around pyNVML, we do something similar for the DGXs in https://github.com/rapidsai/dask-cuda/blob/8d42f27201afa7bf2e5454a20d3e4fd52bcf4448/dask_cuda/tests/test_dgx.py#L30-L40, although those are only for tests today.

JasonAtNvidia commented 4 years ago

There are Jetson boards with multiple GPU capability, DRIVE units are most common. They have a Xavier SoM and a Turing daughter board.

The linux-4-tegra distribution has a file in /etc/nv_tegra_release that contains the version. And you could check for the existence of /sys/class/tegra-firmware/ (a folder) to verify you are running on a Jetson (these exist in the container, whereas nv_tegra_release does not exist in the container)

pentschev commented 4 years ago

There are Jetson boards with multiple GPU capability, DRIVE units are most common. They have a Xavier SoM and a Turing daughter board.

Sorry for the late reply here @JasonAtNvidia , when you say multiple GPU capability you're saying that you can address each process with CUDA_VISIBLE_DEVICES=0, CUDA_VISIBLE_DEVICES=1, and so on? Or how do you choose which GPU the application should use?

The linux-4-tegra distribution has a file in /etc/nv_tegra_release that contains the version. And you could check for the existence of /sys/class/tegra-firmware/ (a folder) to verify you are running on a Jetson (these exist in the container, whereas nv_tegra_release does not exist in the container)

As long as we can choose each GPU correctly, these should work for us to detect the platform correctly so we can work around the current NVML workaround. As soon as you confirm we can indeed use CUDA_VISIBLE_DEVICES for each Dask worker I can submit a PR to address this.

JasonAtNvidia commented 4 years ago

@pentschev Yes, Jetson devices respond to the CUDA_VISIBLE_DEVICES environment variable.

I do not have a Jetson device to test multiple GPUs with, but I am able to verify that CUDA_VISIBLE_DEVICES=0 is successful and CUDA_VISIBLE_DEVICES=1 results in an error that no device is found. I will try to find a multiple GPU device to test with.

pentschev commented 4 years ago

@JasonAtNvidia I just pushed https://github.com/rapidsai/dask-cuda/pull/402 , this should work with Tegra, but I don't have access to a Tegra device to test, it would be great if you could test it when you have a chance.

JasonAtNvidia commented 4 years ago

@pentschev I think your patch is good. It builds and loads on the Jetson device, and I think these are the 3 functions you touched with the patch.

>>> dask_cuda.utils.get_gpu_count()
1
>>> dask_cuda.utils._is_tegra()
True
>>> dask_cuda.utils.get_device_total_memory()
16582901760

pentschev commented 4 years ago

@JasonAtNvidia those are the correct functions. It would be interesting to know if you can go any further to do some Dask computation as well, but as I mentioned before, you won't see any benefits in using dask-cuda with a single GPU vs just using the library (e.g., CuPy, cuDF, etc.) you're trying to compute with alone.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapidsai / dask-cuda

pyNVML won't work on a Jetson, is there a workaround #400