I'm implementing part of the computation on my own with pycuda. It works fine with torch.nn.DataParallel as I was trying pycuda with single GPU. However, when I use two GPUs, I got two error messages which are pycuda._driver.LogicError: Caught LogicError in replica 0 on device 0. and pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?.
I've read FAQ in the documents. Since I'm using Pytorch framework for training, using multiprocessing and threading looks infeasible for me.
With torch.nn.DataParallel, I hope my kernel code would run on two or more GPUs without explicitly assigning device or assigning data to each GPU. I wonder whether pycuda could meet my need.
I'm implementing part of the computation on my own with pycuda. It works fine with torch.nn.DataParallel as I was trying pycuda with single GPU. However, when I use two GPUs, I got two error messages which are
pycuda._driver.LogicError: Caught LogicError in replica 0 on device 0.
andpycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
.I've read FAQ in the documents. Since I'm using Pytorch framework for training, using multiprocessing and threading looks infeasible for me.
With torch.nn.DataParallel, I hope my kernel code would run on two or more GPUs without explicitly assigning device or assigning data to each GPU. I wonder whether pycuda could meet my need.
Thank you!