I am running my network on 2 gpus using DataParallelTable .Following is the snippet
function loadNetworkParallel(net,nGPU)
if nGPU>1 then
require 'cudnn'
require 'cutorch'
require 'cunn'
gpus = torch.range(1,nGPU):totable()
net_parallel = nn.DataParallelTable(1):add(net,{gpus })
return net_parallel:cuda()
elseif nGPU==1 then
require 'cudnn'
require 'cutorch'
require 'cunn'
return net:cuda()
end
end
GTX 980 - RAM: 4035MB
GTX 1080 - RAM: 8114MB
I can run batch size of 15, where approx 4000MB is used by GTX980 and 4000MB by GTX 1080 . There is still (4000MB) left on GTX 1080 but I think DataParallelTable allocates input equally and so I am unable to run batch size greater than 15. Any idea on how can I make it work, or do I need to allocate inputs manually to each gpu by checking how much memory is still left.
I am running my network on 2 gpus using DataParallelTable .Following is the snippet
function loadNetworkParallel(net,nGPU)
I can run batch size of 15, where approx 4000MB is used by GTX980 and 4000MB by GTX 1080 . There is still (4000MB) left on GTX 1080 but I think DataParallelTable allocates input equally and so I am unable to run batch size greater than 15. Any idea on how can I make it work, or do I need to allocate inputs manually to each gpu by checking how much memory is still left.