zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 451 forks source link

pytorch Error in Sync batch norm #211

Open krishnakanthnakka opened 5 years ago

krishnakanthnakka commented 5 years ago

CUDA =9.2 , GCC -6.0

Traceback (most recent call last): File "experiments/segmentation/demo.py", line 16, in output = model.evaluate(img) File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/base.py", line 78, in evaluate pred = self.forward(x) File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/fcn.py", line 51, in forward , , c3, c4 = self.base_forward(x) File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/base.py", line 67, in base_forward x = self.pretrained.conv1(x) File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 122, in forward self.activation, self.slope).view(input_shape) File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/functions/syncbn.py", line 95, in forward y = lib.gpu.batchnorm_forward(x, _ex, _exs, gamma, beta, ctx.eps) RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:289, please report a bug to PyTorch. (BatchNorm_Forward_CUDA at /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:289) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f4d5b06dfe1 in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f4d5b06ddfa in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: BatchNorm_Forward_CUDA(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, float) + 0x2c7 (0x7f4d471d5788 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #3: + 0x6fb5e (0x7f4d471afb5e in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #4: + 0x6a4f5 (0x7f4d471aa4f5 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #5: + 0x62ce9 (0x7f4d471a2ce9 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #6: + 0x63004 (0x7f4d471a3004 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so) frame #7: + 0x5192c (0x7f4d4719192c in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)

frame #16: THPFunction_apply(_object*, _object*) + 0x581 (0x7f4d55c374d1 in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
zhanghang1989 commented 5 years ago

Could you try install CUDA 10.1 and reinstall pytorch?