Thanks for this excellent project, but I have problems to test it successfully.
1. I first run the scripts in ./tests, the errors are as follows:
(1) test_numeric_batchnorm.py
ERROR: testNumericBatchNorm (main.NumericTestCase)
Traceback (most recent call last):
File "test_numeric_batchnorm.py", line 48, in testNumericBatchNorm
self.assertTensorClose(bn.running_mean, a.mean(dim=0))
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
Ran 1 test in 0.192s
FAILED (errors=1)
(2) test_numeric_batchnorm_v2.py
ERROR: testNumericBatchNorm (main.NumericTestCasev2)
Traceback (most recent call last):
File "test_numeric_batchnorm_v2.py", line 33, in testNumericBatchNorm
batchnorm2 = BatchNorm2dReimpl(CHANNELS, momentum=1)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/batchnorm_reimpl.py", line 33, in init
self.weight = nn.Parameter(torch.empty(num_features))
AttributeError: module 'torch' has no attribute 'empty'
Ran 1 test in 0.001s
FAILED (errors=1)
(3) test_sync_batchnorm.py
ERROR: testSyncBatchNorm2DSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 107, in testSyncBatchNorm2DSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10, 16, 16), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.
ERROR: testSyncBatchNormNormalEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 77, in testSyncBatchNormNormalEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormNormalTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 71, in testSyncBatchNormNormalTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 97, in testSyncBatchNormSyncEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False, cuda=True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 87, in testSyncBatchNormSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.
Ran 5 tests in 6.546s
FAILED (errors=5)
2. I secondly run my scripts using net = DataParallelWithCallback(net, device_ids=[0, 1]) with two GPUs (single GPU is all right), the error is:
Traceback (most recent call last):
File "train_scan_em13.py", line 190, in
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.
Thanks for this excellent project, but I have problems to test it successfully.
1. I first run the scripts in ./tests, the errors are as follows:
(1) test_numeric_batchnorm.py
ERROR: testNumericBatchNorm (main.NumericTestCase) Traceback (most recent call last): File "test_numeric_batchnorm.py", line 48, in testNumericBatchNorm self.assertTensorClose(bn.running_mean, a.mean(dim=0)) File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose self.assertTrue(torch.allclose(x, y), message) AttributeError: module 'torch' has no attribute 'allclose' Ran 1 test in 0.192s FAILED (errors=1)
(2) test_numeric_batchnorm_v2.py
ERROR: testNumericBatchNorm (main.NumericTestCasev2) Traceback (most recent call last): File "test_numeric_batchnorm_v2.py", line 33, in testNumericBatchNorm batchnorm2 = BatchNorm2dReimpl(CHANNELS, momentum=1) File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/batchnorm_reimpl.py", line 33, in init self.weight = nn.Parameter(torch.empty(num_features)) AttributeError: module 'torch' has no attribute 'empty' Ran 1 test in 0.001s FAILED (errors=1)
(3) test_sync_batchnorm.py
ERROR: testSyncBatchNorm2DSyncTrain (main.SyncTestCase) Traceback (most recent call last): File "test_sync_batchnorm.py", line 107, in testSyncBatchNorm2DSyncTrain self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10, 16, 16), True, cuda=True) File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult output2.sum().backward() File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion
pos >= 0 && pos < buffer.size()
failed.ERROR: testSyncBatchNormNormalEval (main.SyncTestCase) Traceback (most recent call last): File "test_sync_batchnorm.py", line 77, in testSyncBatchNormNormalEval self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False) File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult self.assertTensorClose(input1.data, input2.data) File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose self.assertTrue(torch.allclose(x, y), message) AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormNormalTrain (main.SyncTestCase) Traceback (most recent call last): File "test_sync_batchnorm.py", line 71, in testSyncBatchNormNormalTrain self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True) File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult self.assertTensorClose(input1.data, input2.data) File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose self.assertTrue(torch.allclose(x, y), message) AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncEval (main.SyncTestCase) Traceback (most recent call last): File "test_sync_batchnorm.py", line 97, in testSyncBatchNormSyncEval self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False, cuda=True) File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult self.assertTensorClose(input1.data, input2.data) File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose self.assertTrue(torch.allclose(x, y), message) AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncTrain (main.SyncTestCase) Traceback (most recent call last): File "test_sync_batchnorm.py", line 87, in testSyncBatchNormSyncTrain self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True, cuda=True) File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult output2.sum().backward() File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion
pos >= 0 && pos < buffer.size()
failed. Ran 5 tests in 6.546s FAILED (errors=5)2. I secondly run my scripts using net = DataParallelWithCallback(net, device_ids=[0, 1]) with two GPUs (single GPU is all right), the error is:
Traceback (most recent call last): File "train_scan_em13.py", line 190, in
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion
pos >= 0 && pos < buffer.size()
failed.Ubuntu 14.04
cuda8.0 & cudnn5.1
python3.6
pytorch 0.3.1
Do you have any suggestions? Thanks.