Traceback (most recent call last):
File "./tools/train.py", line 192, in
main()
File "./tools/train.py", line 181, in main
train_detector(
File "/data1/code/SODA-mmrotate/mmrotate/apis/train.py", line 141, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 63, in train_step
output = self.module.train_step(*inputs[0], *kwargs[0])
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 249, in train_step
loss, log_vars = self._parse_losses(losses)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 208, in _parse_losses
assert log_var_length == len(log_vars) dist.get_world_size(), \
AssertionError: loss log variables are different across GPUs!
rank 7 len(log_vars): 2 keys: loss_cls,loss_bbox
Reproduces the problem - command or script
when I inference the test data, the error occurs during the merge process.
Reproduces the problem - error message
Traceback (most recent call last):
File "./tools/train.py", line 192, in
main()
File "./tools/train.py", line 181, in main
train_detector(
File "/data1/code/SODA-mmrotate/mmrotate/apis/train.py", line 141, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 63, in train_step
output = self.module.train_step(*inputs[0], *kwargs[0])
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 249, in train_step
loss, log_vars = self._parse_losses(losses)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 208, in _parse_losses
assert log_var_length == len(log_vars) dist.get_world_size(), \
AssertionError: loss log variables are different across GPUs!
rank 7 len(log_vars): 2 keys: loss_cls,loss_bbox
Maybe there are empty images which with no available annotations. You can try other models and see whether this error still occurs. btw, more information like the model settings here maybe helpful
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmrotate
Environment
sys.platform: linux Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.58 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.1+cu111 PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.2+cu111 OpenCV: 4.7.0 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMRotate: 0.3.3+04da23d
Reproduces the problem - code sample
Traceback (most recent call last): File "./tools/train.py", line 192, in
main()
File "./tools/train.py", line 181, in main
train_detector(
File "/data1/code/SODA-mmrotate/mmrotate/apis/train.py", line 141, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 63, in train_step
output = self.module.train_step(*inputs[0], *kwargs[0])
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 249, in train_step
loss, log_vars = self._parse_losses(losses)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 208, in _parse_losses
assert log_var_length == len(log_vars) dist.get_world_size(), \
AssertionError: loss log variables are different across GPUs!
rank 7 len(log_vars): 2 keys: loss_cls,loss_bbox
Reproduces the problem - command or script
when I inference the test data, the error occurs during the merge process.
Reproduces the problem - error message
Traceback (most recent call last): File "./tools/train.py", line 192, in
main()
File "./tools/train.py", line 181, in main
train_detector(
File "/data1/code/SODA-mmrotate/mmrotate/apis/train.py", line 141, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 63, in train_step
output = self.module.train_step(*inputs[0], *kwargs[0])
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 249, in train_step
loss, log_vars = self._parse_losses(losses)
File "/home/anaconda3/envs/SODA_rotate/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 208, in _parse_losses
assert log_var_length == len(log_vars) dist.get_world_size(), \
AssertionError: loss log variables are different across GPUs!
rank 7 len(log_vars): 2 keys: loss_cls,loss_bbox
Additional information
No response