Traceback (most recent call last):
File "./tools/train.py", line 178, in <module>
main()
File "./tools/train.py", line 174, in main
meta=meta)
File "/data2/DW/200619_blood_cell/blood_analyzer/projects/cbc_2d/mmdetection/mmdet/apis/train.py", line 150, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/user/.local/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 126, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 55, in train
self.call_hook('after_train_epoch')
File "/home/user/.local/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/data2/DW/200619_blood_cell/blood_analyzer/projects/cbc_2d/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 129, in after_train_epoch
gpu_collect=self.gpu_collect)
File "/data2/DW/200619_blood_cell/blood_analyzer/projects/cbc_2d/mmdetection/mmdet/apis/test.py", line 96, in multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/distributed.py", line 606, in forward
if self.reducer._rebuild_buckets():
RuntimeError: replicas_[0].size() == rebuilt_param_indices_.size() INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/distributed/c10d/reducer.cpp":1326, please report a bug to PyTorch. rebuilt parameter indices size is not same as original model parameters size.218 versus 654
Describe the bug In distributed train, raised error at evaluation step. And working fine in single gpu mode
Reproduction
What command or script did you run?
Did you make any modifications on the code or config? Did you understand what you have modified?
dataset
to fit my setpython3 tools/train.py ....
What dataset did you use?
Environment
Error traceback