I have been using some code on a machine with four Titan X's without any issues. Recently got another machine with 1080 Ti and I just can't figure out what the problem is.
CUDA 8.0
Latest driver 384.90
CentOS 7 (kernel 3.10.0-693)
Take for example the following piece of code:
nn = require 'nn'
cunn = require 'cunn'
n = require('stackedhourglass')
pt = nn.DataParallelTable(1,false,false)
pt:add(n(), {1,2})
pt:cuda()
a = torch.FloatTensor(4,3,192,192):cuda()
o = pt:forward(a)
o = pt:forward(a)
o = pt:forward(a)
print(o)
Most of the time it works - without changing the code, it will occasionally not work. This is not a GPU memory issue. Occasionally it will hang while running updateOutput on a SpatialConvolution module without printing an error.
If I double the batch size, or double the spatial resolution, it will almost certainly fail.
I don't seen to have this issue if I do not use DataParallelTable. It seems like there might be some kind of race condition / something I don't understand. As I say, I didn't have this issue with Titan X cards, so I am beginning to wonder if there is an issue with the driver?
If anyone can offer me any advice it would be very welcome.
Hi all,
I have been using some code on a machine with four Titan X's without any issues. Recently got another machine with 1080 Ti and I just can't figure out what the problem is.
Take for example the following piece of code:
Most of the time it works - without changing the code, it will occasionally not work. This is not a GPU memory issue. Occasionally it will hang while running updateOutput on a SpatialConvolution module without printing an error.
If I double the batch size, or double the spatial resolution, it will almost certainly fail.
I don't seen to have this issue if I do not use DataParallelTable. It seems like there might be some kind of race condition / something I don't understand. As I say, I didn't have this issue with Titan X cards, so I am beginning to wonder if there is an issue with the driver?
If anyone can offer me any advice it would be very welcome.
Thanks, Aaron.