Can pycuda work with torch.nn.DataParallel?

I'm implementing part of the computation on my own with pycuda. It works fine with torch.nn.DataParallel as I was trying pycuda with single GPU. However, when I use two GPUs, I got two error messages which are pycuda._driver.LogicError: Caught LogicError in replica 0 on device 0. and pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?.

I've read FAQ in the documents. Since I'm using Pytorch framework for training, using multiprocessing and threading looks infeasible for me.

With torch.nn.DataParallel, I hope my kernel code would run on two or more GPUs without explicitly assigning device or assigning data to each GPU. I wonder whether pycuda could meet my need.

Thank you!

inducer / pycuda

Can pycuda work with torch.nn.DataParallel? #333