Closed dennisai closed 7 years ago
You can create a cuda context for a specific device. For example you can get the device with the most Gflops like this:
var cudaContext = new CudaContext(CudaContext.GetMaxGflopsDeviceId());
Yes I understand that, but how does that allow me to choose which GPU to allocate a CudaDeviceVariable to? For example, in my code, I might have:
private CudaDeviceVariable<float> d = new CudaDeviceVariable<float>(1);
Nothing in that line, or in the source code, seems to let refer to a CudaContext
. Do I need to issue CudaContext.SetCurrent()
to specify the current device, and all subsequent CudaDeviceVariable allocations will be allocated using the current device?
Yes, even though I can't find it in the source of ManagedCuda with a quick look, I know from the standard CUDA C++ library that all the CUDA calls will be issued to the current GPU. The only exception is a call like PeerCopyToDevice, in which you have to specify both contexts yourself. This allows you to copy from one GPU to another.
Be careful with threading in this case, async calls will also use the current GPU.
This would do it:
var gpu0 = new CudaContext(0);
//gpu0 is now the current context bound to the calling host/CPU thread
CudaDeviceVariable<float> var1_onGpu0 = new CudaDeviceVariable<float>(123);
var gpu1 = new CudaContext(1);
//gpu1 is now current: gpu0 is now "floating" (unbound), hence you can't access var1_onGpu0 from host
CudaDeviceVariable<float> var2_onGpu1 = new CudaDeviceVariable<float>(123);
//set gpu0 current again:
gpu0.SetCurrent();
CudaDeviceVariable<float> var2_onGpu0 = new CudaDeviceVariable<float>(1234);
You can of course also create a distinct CPU-thread for each device as each CudaContext is bound to one host thread. This avoids the switching of the contexts in your code...
@kunzmi It looks like the latest NuGet package no longer has GetMaxGflopsDeviceId() method. Is there an alternative way to achieve the same result?
Hi, not a direct one.
The problem is that this method was taken from the CUDA samples and this function actually doesn't work in a future safe way, i.e. if you take an older managedcuda version with todays GPUs you wouldn't get the right results.
The CUDA samples still have it, have a look at C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.2\common\inc\helper_cuda_drvapi.h
. It basically just does some checks on the device properties and you can easily implement a similar function using CudaDeviceProperties
. But again, it will fail in a few years with new GPU generations...
If so, is there an example that I can look at? Specifically, I am wondering how one would choose a GPU to allocate a new CudaDeviceVariable to?