open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.88k stars 556 forks source link

In training it's comming a RuntimeError by evolution[Bug] #593

Open Petopp opened 2 years ago

Petopp commented 2 years ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

Hello, when I try to start a traning of a model, it works until the point of evaluation. Here it comes then it then to the error message (see below)

Does anyone have an idea what to do? In Colab worked this version, now with this error it's on a PC (A3000) an Ubuntu 22.04 in WSL2.

I have installed the following

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Version mmrotate: 0.3.3 Version MMDetection: 2.25.3 CUDA: 11.7 GCC: GCC 9.4

Reproduces the problem - code sample

` from mmrotate.datasets.builder import ROTATED_DATASETS from mmrotate.datasets.dota import DOTADataset from mmcv import Config from mmdet.apis import set_random_seed import os.path as osp import mmcv from mmdet.datasets import build_dataset from mmdet.models import build_detector from mmdet.apis import train_detector

@ROTATED_DATASETS.register_module() class TinyDataset(DOTADataset): """SAR ship dataset for detection.""" CLASSES = Klassen

if Pfad_Bilder[len(Pfad_Bilder)-1] !="/": Pfad_Bilder=Pfad_Bilder+"/"

!mkdir -p {Pfad_Modell_Speichern}

cfg = Config.fromfile('/mnt/c/Users/YYYY/OneDrive - XXXX AG/Entwicklung/Modelle/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py')

/// Modify dataset type and path cfg.dataset_type = 'RotatedFasterRCNN' cfg.data_root = Pfad_Bilder

cfg.data.test.type = 'TinyDataset' cfg.data.test.data_root = Pfad_Bilder cfg.data.test.ann_file = 'val' cfg.data.test.img_prefix = 'images'

cfg.data.train.type = 'TinyDataset' cfg.data.train.data_root = Pfad_Bilder cfg.data.train.ann_file = 'train' cfg.data.train.img_prefix = 'images'

cfg.data.val.type = 'TinyDataset' cfg.data.val.data_root = Pfad_Bilder cfg.data.val.ann_file = 'val' cfg.data.val.img_prefix = 'images'

/// modify num classes of the model in box head cfg.model.roi_head.bbox_head.num_classes = 1 /// We can still use the pre-trained Mask RCNN model though we do not need to /// use the mask branch ///cfg.load_from = 'oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth' cfg.load_from = '/mnt/c/Users/YYYY/OneDrive - XXXX AG/Entwicklung/Modelle/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90-0393aa5c.pth'

/// Set up working dir to save files and logs. cfg.work_dir = Pfad_Modell_Speichern

cfg.optimizer.lr = 0.001 cfg.lr_config.warmup = None cfg.runner.max_epochs = 100 /// Epochen Anzahl cfg.log_config.interval = 10

/// Change the evaluation metric since we use customized dataset. cfg.evaluation.metric = 'mAP' /// We can set the evaluation interval to reduce the evaluation times cfg.evaluation.interval = 22 /// We can set the checkpoint saving interval to reduce the storage cost cfg.checkpoint_config.interval = 10

/// Set seed thus the results are more reproducible cfg.seed = 0 set_random_seed(0, deterministic=False) cfg.gpu_ids = range(1) cfg.device='cuda'

/// We can also use tensorboard to log the training process cfg.log_config.hooks = [ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]

/// We can initialize the logger for training and have a look /// at the final config used for training ///print(f'Config:\n{cfg.pretty_text}')

/// Build dataset datasets = [build_dataset(cfg.data.train)]

/// Build the detector model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg')) /// Add an attribute for visualization convenience model.CLASSES = datasets[0].CLASSES

/// Config speichern with open (Pfad_Modell_Speichern+"/config.py","w") as f: f.write(cfg.pretty_text)

/// Create work_dir mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) train_detector(model, datasets, cfg, distributed=False, validate=True)

/// Config speichern with open (Pfad_Modell_Speichern+"/config.py","w") as f: f.write(cfg.pretty_text) `

Reproduces the problem - command or script

See please Code sample

Reproduces the problem - error message

2022-11-02 14:19:36,804 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled. 2022-11-02 14:19:36,870 - mmdet - INFO - load checkpoint from local path: /mnt/c/Users/XXX/OneDrive - YYY AG/Entwicklung/Modelle/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90-0393aa5c.pth 2022-11-02 14:19:37,339 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([16, 1024]) from checkpoint, the shape in current model is torch.Size([2, 1024]). size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([2]). 2022-11-02 14:19:37,340 - mmdet - INFO - Start running, host: pt@PC23819, work_dir: /mnt/c/Users/XXX/OneDrive - YYY AG/Entwicklung/Modelle/rotated_faster_rcnn/Test 2022-11-02 14:19:37,341 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook


after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_run: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


2022-11-02 14:19:37,342 - mmdet - INFO - workflow: [('train', 1)], max: 100 epochs 2022-11-02 14:19:37,342 - mmdet - INFO - Checkpoints will be saved to /mnt/c/Users/XXX/OneDrive - YYY AG/Entwicklung/Modelle/rotated_faster_rcnn/Test by HardDiskBackend. 2022-11-02 14:19:41,018 - mmdet - INFO - Epoch [1][10/10] lr: 1.000e-03, eta: 0:06:02, time: 0.366, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0948, loss_rpn_bbox: 0.0068, loss_cls: 0.3064, acc: 93.9844, loss_bbox: 0.0766, loss: 0.4845, grad_norm: 5.2189 2022-11-02 14:19:45,302 - mmdet - INFO - Epoch [2][10/10] lr: 1.000e-03, eta: 0:06:27, time: 0.424, data_time: 0.212, memory: 3545, loss_rpn_cls: 0.0179, loss_rpn_bbox: 0.0059, loss_cls: 0.1306, acc: 92.2363, loss_bbox: 0.0823, loss: 0.2367, grad_norm: 1.6588 2022-11-02 14:19:49,642 - mmdet - INFO - Epoch [3][10/10] lr: 1.000e-03, eta: 0:06:34, time: 0.430, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0113, loss_rpn_bbox: 0.0059, loss_cls: 0.1112, acc: 92.0020, loss_bbox: 0.0820, loss: 0.2104, grad_norm: 1.2720 2022-11-02 14:19:57,455 - mmdet - INFO - Epoch [4][10/10] lr: 1.000e-03, eta: 0:07:59, time: 0.777, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0056, loss_rpn_bbox: 0.0051, loss_cls: 0.1020, acc: 92.9688, loss_bbox: 0.0831, loss: 0.1958, grad_norm: 1.1252 2022-11-02 14:20:01,124 - mmdet - INFO - Epoch [5][10/10] lr: 1.000e-03, eta: 0:07:28, time: 0.363, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0045, loss_rpn_bbox: 0.0053, loss_cls: 0.0879, acc: 95.9082, loss_bbox: 0.0725, loss: 0.1702, grad_norm: 0.9871 2022-11-02 14:20:04,708 - mmdet - INFO - Epoch [6][10/10] lr: 1.000e-03, eta: 0:07:05, time: 0.354, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0034, loss_rpn_bbox: 0.0047, loss_cls: 0.0846, acc: 96.2500, loss_bbox: 0.0743, loss: 0.1671, grad_norm: 1.0791 2022-11-02 14:20:08,305 - mmdet - INFO - Epoch [7][10/10] lr: 1.000e-03, eta: 0:06:47, time: 0.356, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0043, loss_rpn_bbox: 0.0049, loss_cls: 0.0804, acc: 96.6309, loss_bbox: 0.0694, loss: 0.1590, grad_norm: 1.0585 2022-11-02 14:20:11,919 - mmdet - INFO - Epoch [8][10/10] lr: 1.000e-03, eta: 0:06:34, time: 0.356, data_time: 0.213, memory: 3545, loss_rpn_cls: 0.0014, loss_rpn_bbox: 0.0043, loss_cls: 0.0787, acc: 96.6895, loss_bbox: 0.0712, loss: 0.1556, grad_norm: 1.0230 2022-11-02 14:20:15,491 - mmdet - INFO - Epoch [9][10/10] lr: 1.000e-04, eta: 0:06:22, time: 0.353, data_time: 0.212, memory: 3545, loss_rpn_cls: 0.0033, loss_rpn_bbox: 0.0050, loss_cls: 0.0745, acc: 97.0898, loss_bbox: 0.0694, loss: 0.1521, grad_norm: 0.9848 2022-11-02 14:20:19,087 - mmdet - INFO - Epoch [10][10/10] lr: 1.000e-04, eta: 0:06:12, time: 0.355, data_time: 0.211, memory: 3545, loss_rpn_cls: 0.0029, loss_rpn_bbox: 0.0045, loss_cls: 0.0734, acc: 97.1484, loss_bbox: 0.0664, loss: 0.1471, grad_norm: 0.9927 2022-11-02 14:20:19,123 - mmdet - INFO - Saving checkpoint at 10 epochs 2022-11-02 14:20:24,088 - mmdet - INFO - Epoch [11][10/10] lr: 1.000e-04, eta: 0:06:04, time: 0.368, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0027, loss_rpn_bbox: 0.0046, loss_cls: 0.0757, acc: 96.9434, loss_bbox: 0.0786, loss: 0.1616, grad_norm: 1.0617 2022-11-02 14:20:27,712 - mmdet - INFO - Epoch [12][10/10] lr: 1.000e-05, eta: 0:05:56, time: 0.358, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0033, loss_rpn_bbox: 0.0046, loss_cls: 0.0732, acc: 97.1191, loss_bbox: 0.0683, loss: 0.1495, grad_norm: 1.0766 2022-11-02 14:20:31,310 - mmdet - INFO - Epoch [13][10/10] lr: 1.000e-05, eta: 0:05:49, time: 0.356, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0038, loss_rpn_bbox: 0.0040, loss_cls: 0.0757, acc: 96.7676, loss_bbox: 0.0692, loss: 0.1527, grad_norm: 1.0383 2022-11-02 14:20:34,883 - mmdet - INFO - Epoch [14][10/10] lr: 1.000e-05, eta: 0:05:42, time: 0.353, data_time: 0.212, memory: 3545, loss_rpn_cls: 0.0018, loss_rpn_bbox: 0.0045, loss_cls: 0.0728, acc: 97.1582, loss_bbox: 0.0661, loss: 0.1451, grad_norm: 0.9824 2022-11-02 14:20:38,485 - mmdet - INFO - Epoch [15][10/10] lr: 1.000e-05, eta: 0:05:35, time: 0.356, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0033, loss_rpn_bbox: 0.0050, loss_cls: 0.0754, acc: 97.0410, loss_bbox: 0.0766, loss: 0.1603, grad_norm: 1.1741 2022-11-02 14:20:42,111 - mmdet - INFO - Epoch [16][10/10] lr: 1.000e-05, eta: 0:05:29, time: 0.358, data_time: 0.214, memory: 3545, loss_rpn_cls: 0.0040, loss_rpn_bbox: 0.0048, loss_cls: 0.0715, acc: 97.0312, loss_bbox: 0.0698, loss: 0.1500, grad_norm: 1.0517 2022-11-02 14:20:45,705 - mmdet - INFO - Epoch [17][10/10] lr: 1.000e-05, eta: 0:05:24, time: 0.355, data_time: 0.213, memory: 3545, loss_rpn_cls: 0.0034, loss_rpn_bbox: 0.0046, loss_cls: 0.0730, acc: 96.9629, loss_bbox: 0.0727, loss: 0.1538, grad_norm: 1.0351 2022-11-02 14:20:49,307 - mmdet - INFO - Epoch [18][10/10] lr: 1.000e-05, eta: 0:05:18, time: 0.356, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0020, loss_rpn_bbox: 0.0045, loss_cls: 0.0747, acc: 97.0215, loss_bbox: 0.0669, loss: 0.1482, grad_norm: 1.0223 2022-11-02 14:20:52,972 - mmdet - INFO - Epoch [19][10/10] lr: 1.000e-05, eta: 0:05:13, time: 0.362, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0036, loss_rpn_bbox: 0.0051, loss_cls: 0.0756, acc: 96.9043, loss_bbox: 0.0756, loss: 0.1598, grad_norm: 1.0819 2022-11-02 14:20:56,675 - mmdet - INFO - Epoch [20][10/10] lr: 1.000e-05, eta: 0:05:08, time: 0.366, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0032, loss_rpn_bbox: 0.0048, loss_cls: 0.0761, acc: 96.9629, loss_bbox: 0.0687, loss: 0.1528, grad_norm: 1.0895 2022-11-02 14:20:56,708 - mmdet - INFO - Saving checkpoint at 20 epochs 2022-11-02 14:21:01,789 - mmdet - INFO - Epoch [21][10/10] lr: 1.000e-05, eta: 0:05:04, time: 0.375, data_time: 0.213, memory: 3545, loss_rpn_cls: 0.0011, loss_rpn_bbox: 0.0041, loss_cls: 0.0757, acc: 96.7383, loss_bbox: 0.0721, loss: 0.1530, grad_norm: 1.0362 2022-11-02 14:21:05,466 - mmdet - INFO - Epoch [22][10/10] lr: 1.000e-05, eta: 0:04:59, time: 0.363, data_time: 0.215, memory: 3545, loss_rpn_cls: 0.0021, loss_rpn_bbox: 0.0043, loss_cls: 0.0736, acc: 96.9824, loss_bbox: 0.0684, loss: 0.1484, grad_norm: 1.0480 [ ] 0/22, elapsed: 0s, ETA:

RuntimeError Traceback (most recent call last) Cell In [9], line 78 76 Create work_dir 77 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) ---> 78 train_detector(model, datasets, cfg, distributed=False, validate=True) 80 Config speichern 81 with open (Pfad_Modell_Speichern+"/config.py","w") as f:

File ~/.local/lib/python3.8/site-packages/mmdet/apis/train.py:244, in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta) 242 elif cfg.load_from: 243 runner.load_checkpoint(cfg.load_from) --> 244 runner.run(data_loaders, cfg.workflow)

File ~/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py:136, in EpochBasedRunner.run(self, data_loaders, workflow, max_epochs, kwargs) 134 if mode == 'train' and self.epoch >= self._max_epochs: 135 break --> 136 epoch_runner(data_loaders[i], kwargs) 138 time.sleep(1) wait for some hooks like loggers to finish 139 self.call_hook('after_run')

File ~/.local/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py:58, in EpochBasedRunner.train(self, data_loader, **kwargs) 55 del self.data_batch 56 self._iter += 1 ---> 58 self.call_hook('after_train_epoch') 59 self._epoch += 1

File ~/.local/lib/python3.8/site-packages/mmcv/runner/base_runner.py:317, in BaseRunner.call_hook(self, fn_name) 310 """Call all hooks. 311 312 Args: 313 fn_name (str): The function name in each hook to be called, such as 314 "before_train_epoch". 315 """ 316 for hook in self._hooks: --> 317 getattr(hook, fn_name)(self)

File ~/.local/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py:271, in EvalHook.after_train_epoch(self, runner) 269 """Called after every training epoch to evaluate the results.""" 270 if self.by_epoch and self._should_evaluate(runner): --> 271 self._do_evaluate(runner)

File ~/.local/lib/python3.8/site-packages/mmdet/core/evaluation/eval_hooks.py:60, in EvalHook._do_evaluate(self, runner) 56 from mmdet.apis import single_gpu_test 58 Changed results to self.results so that MMDetWandbHook can access 59 the evaluation results and log them to wandb. ---> 60 results = single_gpu_test(runner.model, self.dataloader, show=False) 61 self.latest_results = results 62 runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)

File ~/.local/lib/python3.8/site-packages/mmdet/apis/test.py:29, in single_gpu_test(model, data_loader, show, out_dir, show_score_thr) 27 for i, data in enumerate(data_loader): 28 with torch.no_grad(): ---> 29 result = model(return_loss=False, rescale=True, **data) 31 batch_size = len(result) 32 if show or out_dir:

File ~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, *kwargs) 1186 If we don't have any hooks, we want to skip the rest of the logic in 1187 this function, and just call forward. 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(input, **kwargs) 1191 Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py:51, in MMDataParallel.forward(self, *inputs, kwargs) 49 return self.module(*inputs[0], *kwargs[0]) 50 else: ---> 51 return super().forward(inputs, kwargs)

File ~/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:169, in DataParallel.forward(self, *inputs, *kwargs) 166 kwargs = ({},) 168 if len(self.device_ids) == 1: --> 169 return self.module(inputs[0], **kwargs[0]) 170 replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) 171 outputs = self.parallel_apply(replicas, inputs, kwargs)

File ~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, *kwargs) 1186 If we don't have any hooks, we want to skip the rest of the logic in 1187 this function, and just call forward. 1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1189 or _global_forward_hooks or _global_forward_pre_hooks): -> 1190 return forward_call(input, **kwargs) 1191 Do not call functions when jit is used 1192 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py:116, in auto_fp16..auto_fp16_wrapper..new_func(*args, *kwargs) 113 raise TypeError('@auto_fp16 can only be used to decorate the ' 114 f'method of those classes {supported_types}') 115 if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled): --> 116 return old_func(args, **kwargs) 118 get the arg spec of the decorated method 119 args_info = getfullargspec(old_func)

File ~/.local/lib/python3.8/site-packages/mmdet/models/detectors/base.py:174, in BaseDetector.forward(self, img, img_metas, return_loss, kwargs) 172 return self.forward_train(img, img_metas, kwargs) 173 else: --> 174 return self.forward_test(img, img_metas, **kwargs)

File ~/.local/lib/python3.8/site-packages/mmdet/models/detectors/base.py:147, in BaseDetector.forward_test(self, imgs, img_metas, kwargs) 145 if 'proposals' in kwargs: 146 kwargs['proposals'] = kwargs['proposals'][0] --> 147 return self.simple_test(imgs[0], img_metas[0], kwargs) 148 else: 149 assert imgs[0].size(0) == 1, 'aug test does not support ' \ 150 'inference with batch size ' \ 151 f'{imgs[0].size(0)}'

File ~/mmrotate/mmrotate/models/detectors/two_stage.py:183, in RotatedTwoStageDetector.simple_test(self, img, img_metas, proposals, rescale) 180 else: 181 proposal_list = proposals --> 183 return self.roi_head.simple_test( 184 x, proposal_list, img_metas, rescale=rescale)

File ~/mmrotate/mmrotate/models/roi_heads/rotate_standard_roi_head.py:252, in RotatedStandardRoIHead.simple_test(self, x, proposal_list, img_metas, rescale) 236 """Test without augmentation. 237 238 Args: (...) 248 dict[str, Tensor]: a dictionary of bbox_results. 249 """ 250 assert self.with_bbox, 'Bbox head must be implemented.' --> 252 det_bboxes, det_labels = self.simple_test_bboxes( 253 x, img_metas, proposal_list, self.test_cfg, rescale=rescale) 255 bbox_results = [ 256 rbbox2result(det_bboxes[i], det_labels[i], 257 self.bbox_head.num_classes) 258 for i in range(len(det_bboxes)) 259 ] 261 return bbox_results

File ~/mmrotate/mmrotate/models/roi_heads/rotate_standard_roi_head.py:342, in RotatedStandardRoIHead.simple_test_bboxes(self, x, img_metas, proposals, rcnn_test_cfg, rescale) 338 det_label = rois[i].new_zeros( 339 (0, self.bbox_head.fc_cls.out_features)) 341 else: --> 342 det_bbox, det_label = self.bbox_head.get_bboxes( 343 rois[i], 344 cls_score[i], 345 bbox_pred[i], 346 img_shapes[i], 347 scale_factors[i], 348 rescale=rescale, 349 cfg=rcnn_test_cfg) 350 det_bboxes.append(det_bbox) 351 det_labels.append(det_label)

File ~/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py:205, in force_fp32..force_fp32_wrapper..new_func(*args, *kwargs) 202 raise TypeError('@force_fp32 can only be used to decorate the ' 203 'method of nn.Module') 204 if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled): --> 205 return old_func(args, **kwargs) 206 get the arg spec of the decorated method 207 args_info = getfullargspec(old_func)

File ~/mmrotate/mmrotate/models/roi_heads/bbox_heads/rotated_bbox_head.py:418, in RotatedBBoxHead.get_bboxes(self, rois, cls_score, bbox_pred, img_shape, scale_factor, rescale, cfg) 416 return bboxes, scores 417 else: --> 418 det_bboxes, det_labels = multiclass_nms_rotated( 419 bboxes, scores, cfg.score_thr, cfg.nms, cfg.max_per_img) 420 return det_bboxes, det_labels

File ~/mmrotate/mmrotate/core/post_processing/bbox_nms_rotated.py:58, in multiclass_nms_rotated(multi_bboxes, multi_scores, score_thr, nms, max_num, score_factors, return_inds) 55 scores = scores * score_factors 57 inds = valid_mask.nonzero(as_tuple=False).squeeze(1) ---> 58 bboxes, scores, labels = bboxes[inds], scores[inds], labels[inds] 60 if bboxes.numel() == 0: 61 dets = torch.cat([bboxes, scores[:, None]], -1)

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Additional information

No response

zytx121 commented 2 years ago

This seems to be caused by the inconsistent device of variables in the NMS function. Please ensure that the variables are on the same device. NMS only be triggered in the test phase, so no error will be reported in the training phase.

Petopp commented 2 years ago

Hello what do you mean by NMS? Or how can I influence that, was not installed during setup with?

The training runs on Ubuntu 20.04 in WSL2 (Windows 10) and is controlled via a Jupyter Lab. From there, everything actually runs on one machine.

Best regards

Peter

zytx121 commented 1 year ago

According to https://github.com/open-mmlab/mmrotate/issues/511#issuecomment-1258864768, maybe you need to check wheather installed the mmcv corresponding to CUDA version. And you can get the install command here.

image
Petopp commented 1 year ago

Hello, unfortunately only now came to test this. Have now installed the version again which fits the CUDA and Torch. The error message is unfortunately still unchanged. Now try to get this to run in Windows. I will see if the same message appears.

akamal816 commented 1 year ago

According to #511 (comment), maybe you need to check wheather installed the mmcv corresponding to CUDA version. And you can get the install command here.

image

Hello,

I'm having the same issue. Using the method above to install mmcv (modified OS & CUDA parameter) helped me run mmDetect on my system by resolving the No module named 'mmcv._ext' error I had. It unfortunately does not help with MMrotate.

I'm trying to perform step 2a of the MMrotate install guide: python demo/image_demo.py demo/demo.jpg oriented_rcnn_r50_fpn_1x_dota_le90.py oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth --out-file result.jpg

After running it I receive the following runtime error: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Any other suggestions?

My Environment is as follows: Windows 10 CPU: AMD 5800H GPU: RTX 3080 (Mobile) Python 3.9.13 CUDA 11.6 Torch 1.13 mmcv 1.7.0 mmengine 0.4.0 mmdet 2.27.0 (mmDET is successfully working per the validation method in the install guide)

akamal816 commented 1 year ago

Hello, I've resolved this issue by following @2505928188 comment from #511

I have solved this problem by lower CUDA and mmcv version. my versiuon is: cuda : 11.6.0 pytorch: 1.12.1 mmcv-full: 1.6.2

rishabh10gpt commented 1 year ago

Hello, I did a work around to solve this issue, by modifying the file multiclass_nms_rotated.py by : replacing : labels = torch.arange(num_classes, dtype=torch.long) with : labels = torch.arange(num_classes, dtype=torch.long).to(scores.device)

yangxue0827 commented 1 year ago

@zytx121

x-yy0 commented 1 year ago

Hello, I did a work around to solve this issue, by modifying the file multiclass_nms_rotated.py by : replacing : labels = torch.arange(num_classes, dtype=torch.long) with : labels = torch.arange(num_classes, dtype=torch.long).to(scores.device)

it works for me, thank you!