sacmehta / ESPNet

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
https://sacmehta.github.io/ESPNet/
MIT License
541 stars 112 forks source link

training with multiGPUS #10

Closed wldeephi closed 6 years ago

wldeephi commented 6 years ago

Hi @sacmehta ,the pytorch version of my system environment is 2.0 , when I want to train using multiGPUS with nn.DataParallel, it reports errors as following:

Traceback (most recent call last): File "main.py", line 427, in trainValidateSegmentation(parser.parse_args()) File "main.py", line 357, in trainValidateSegmentation lossTr, overall_acc_tr, per_class_acc_tr, per_class_iu_tr, mIOU_tr = train(args, trainLoader, model, criteria, optimizer, epoch) File "main.py", line 107, in train output = model(input_var) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 59, in forward replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 64, in replicate return replicate(module, device_ids) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/replicate.py", line 21, in replicate modules = list(network.modules()) TypeError: 'list' object is not callable

sacmehta commented 6 years ago

We trained our models on single GPU and have not tried it. However, Pytorch's DataParallel should work without any issues. I don't see any issue why it should not work with our model.

Try the below command. If this does not work, please check Pytorch forum.

model = torch.nn.DataParallel(model).cuda()