Open ChengHuang-CH opened 6 years ago
HaHa, I am very excited to be here again since I have solved some problems. And here I would share the solutions about DataParallel and my experiences on new pytorch 0.4.0 + windows10.
Firstly, I solved the problem about DataParallel problem: AttributeError: 'DataParallel' object has no attribute 'loss'
The solution came out from the topic --How to reach model attributes wrapped by nn.DataParallel?
so I could revise the code as follows:
USE_CUDA = True
Use_Dataparallel = False # firstly set single gpu mode if using cuda
# ...{other codes}
# code to activate DataParallel mode:
if USE_CUDA:
if torch.cuda.device_count() > 1:
print("Let's use %d GPUs" % torch.cuda.device_count())
Use_Dataparallel = True # transfer to multi-gpu mode
capsule_net = nn.DataParallel(capsule_net).cuda()
# ...{other codes}
if Use_Dataparallel:
loss = capsule_net.module.losses(inputs, output, target, reconstructions) # use 'module' to reach attributes 'losses' wrapped by nn.DataParallel
else:
loss = capsule_net.losses(inputs, output, target, reconstructions) # single gpu mode
Secondly, I test this code on the official released version of pytorch 0.4.0 on Windows10, there would be somewhere to pay attention to:
(1) A special multiprocessing error on windows--Windows FAQ
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
So all code should be put under if __name__ == '__main__':
except four network definition classes.
(2) Error about 'torch.sparse'
target= torch.sparse.torch.eye(10).index_select(dim=0, index=target)
AttributeError: module 'torch.sparse' has no attribute 'torch'
According to a similar question, it would work well after replacing torch.sparse.torch.eye(10)
with torch.eye(10)
(3) An userwaring to use tensor.item() instead of .data[0]
UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
train_loss += loss.data[0] # transfer loss.data[0] to loss.item() in pytorch 0.4.0
so it would be OK after being revised as follows:
train_loss += loss.item()
(Those tests are based on Windows 10 + python 3.6 + pytorch 0.4.0)
First of all, thanks, its definitely an easy to follow CapsNet tutorial for me as a beginner, but I found an error after running the code:
I solved this issue same as https://github.com/gram-ai/capsule-networks/issues/13, in Decoder class :
".data" should be removed.
Then I successfully trained on single GPU according to this tutorial, but when I tried to train the net on two GPUs according to PyTorch data parallelism tutorial :
but it produced an error
AttributeError: 'DataParallel' object has no attribute 'loss'
I'm confused, and if there is any good solution, please tell me, thanks!
(I use python 2.7.12 and pytorch 0.3.0.post4)