zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 450 forks source link

A problem about syncbn.py #300

Closed wdddddd closed 4 years ago

wdddddd commented 4 years ago

when I run the train_dist.py I meet a problem -- RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function. The problem related in PyTorch-Encoding/encoding/nn/syncbn.py", line 97, in forward self.eps, self.momentum, self.training, process_group)

zhanghang1989 commented 4 years ago

Could you try PyTorch 1.4.0

jingluw commented 4 years ago

I encountered the same problem. I used PyTorch 1.5.0.

/python3.6/site-packages/encoding/nn/syncbn.py", line 202, in forward self.activation, self.slope).view(input_shape) RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function

It may relate to DataParallel for multi-gpu, but I haven't find the solution. /python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/xxx/py3-torch1.5/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0.

wdddddd commented 4 years ago

I encountered the same problem. I used PyTorch 1.5.0.

/python3.6/site-packages/encoding/nn/syncbn.py", line 202, in forward self.activation, self.slope).view(input_shape) RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function

It may relate to DataParallel for multi-gpu, but I haven't find the solution. /python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/xxx/py3-torch1.5/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0.

Change to pytorch1.4.0, the problem can be solved.

wdddddd commented 4 years ago

Could you try PyTorch 1.4.0

It works, thanks!!!