Closed wujiaju closed 5 years ago
the func_kernel<<<1,1>>>
needs to take a 3rd argument which is the current cuda stream. Also, you need to switch the current device to device 3 in your CUDA code.
Otherwise the kernel launches before you finished copying the Tensor to GPU-3, and you'll launch on the wrong GPU.
Look at at::DeviceGuard
and at::cuda::getCurrentCUDAStream();
Hello.
I wrote my test codes as follow: test.py
cuda.cpp
cuda_kernel.cu
When I used
a = a.cuda(0)
in test.py. I got expected result:But when I used
a = a.cuda(3)
(I have multiple GPUs). I gotThe result tensor was tensor([0]). Why?
Thanks a lot.
OS: Ubuntu 14
PyTorch version: torch-nightly 1.0.0.dev20190219
How you installed PyTorch (conda, pip, source): I compiled and ran the code in a dockr container. The docker image was ufoym/deepo:pytorch-py36-cu90
GPU models and configuration: 4 GeForce GTX TITANs