Closed wldeephi closed 6 years ago
We trained our models on single GPU and have not tried it. However, Pytorch's DataParallel should work without any issues. I don't see any issue why it should not work with our model.
Try the below command. If this does not work, please check Pytorch forum.
model = torch.nn.DataParallel(model).cuda()
Hi @sacmehta ,the pytorch version of my system environment is 2.0 , when I want to train using multiGPUS with nn.DataParallel, it reports errors as following:
Traceback (most recent call last): File "main.py", line 427, in
trainValidateSegmentation(parser.parse_args())
File "main.py", line 357, in trainValidateSegmentation
lossTr, overall_acc_tr, per_class_acc_tr, per_class_iu_tr, mIOU_tr = train(args, trainLoader, model, criteria, optimizer, epoch)
File "main.py", line 107, in train
output = model(input_var)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 59, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 64, in replicate
return replicate(module, device_ids)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/replicate.py", line 21, in replicate
modules = list(network.modules())
TypeError: 'list' object is not callable