Hello, I tried to change almost all the configuration parameters, but nothing helps. The training always stops at 200 iterations, I've spent over 100 hours trying to figure out why. 1400 photos in the train and 450 for validation. I adapted my custom dataset I adapted the structure of my dataset to the Kitty_tiny dataset, as shown in the DEMO in mmdetection, in order to train the model exactly as shown in the DEMO MODEL - SSD300 The training always stops at 200 iterations, i don't know why((((((

tmp/ipykernel_3063/3895221999.py:58: DeprecationWarning: np.long is a deprecated alias for np.compat.long. To silence this warning, use np.compat.long by itself. In the likely event your code does not need to work on Python 2 you can use the builtin int for which np.compat.long is itself an alias. Doing this will not modify any behaviour and is safe. When replacing np.long, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations labels=np.array(gt_labels, dtype=np.long), /tmp/ipykernel_3063/3895221999.py:61: DeprecationWarning: np.long is a deprecated alias for np.compat.long. To silence this warning, use np.compat.long by itself. In the likely event your code does not need to work on Python 2 you can use the builtin int for which np.compat.long is itself an alias. Doing this will not modify any behaviour and is safe. When replacing np.long, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations labels_ignore=np.array(gt_labels_ignore, dtype=np.long)) /home/ivan/Рабочий стол/mmdetection/mmdet/datasets/custom.py:179: UserWarning: CustomDataset does not support filtering empty gt images. warnings.warn( 2022-03-21 21:46:05,961 - mmdet - INFO - load checkpoint from local path: checkpoints/test.pth 2022-03-21 21:46:07,965 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for bbox_head.cls_convs.0.0.weight: copying a param with shape torch.Size([324, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]). size mismatch for bbox_head.cls_convs.0.0.bias: copying a param with shape torch.Size([324]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for bbox_head.cls_convs.1.0.weight: copying a param with shape torch.Size([486, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 1024, 3, 3]). size mismatch for bbox_head.cls_convs.1.0.bias: copying a param with shape torch.Size([486]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for bbox_head.cls_convs.2.0.weight: copying a param with shape torch.Size([486, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 512, 3, 3]). size mismatch for bbox_head.cls_convs.2.0.bias: copying a param with shape torch.Size([486]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for bbox_head.cls_convs.3.0.weight: copying a param with shape torch.Size([486, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 256, 3, 3]). size mismatch for bbox_head.cls_convs.3.0.bias: copying a param with shape torch.Size([486]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for bbox_head.cls_convs.4.0.weight: copying a param with shape torch.Size([324, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]). size mismatch for bbox_head.cls_convs.4.0.bias: copying a param with shape torch.Size([324]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for bbox_head.cls_convs.5.0.weight: copying a param with shape torch.Size([324, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]). size mismatch for bbox_head.cls_convs.5.0.bias: copying a param with shape torch.Size([324]) from checkpoint, the shape in current model is torch.Size([64]). 2022-03-21 21:46:07,966 - mmdet - INFO - Start running, host: ivan@ivan-GL73-8RC, work_dir: /home/ivan/Рабочий стол/mmdetection/tutorial_exps 2022-03-21 21:46:07,966 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook

before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook

before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) CheckInvalidLossHook

after_train_epoch: (NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook

before_val_epoch: (NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook

after_run: (VERY_LOW ) TextLoggerHook

2022-03-21 21:46:07,967 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs 2022-03-21 21:46:07,967 - mmdet - INFO - Checkpoints will be saved to /home/ivan/Рабочий стол/mmdetection/tutorial_exps by HardDiskBackend. /home/ivan/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 2022-03-21 21:46:36,019 - mmdet - INFO - Epoch [1][10/881] lr: 2.500e-03, eta: 15:59:09, time: 2.723, data_time: 0.817, memory: 1782, loss_cls: 9.9198, loss_bbox: 5.8576, loss: 15.7773 2022-03-21 21:46:53,797 - mmdet - INFO - Epoch [1][20/881] lr: 2.500e-03, eta: 13:15:12, time: 1.794, data_time: 1.053, memory: 1782, loss_cls: 6.9702, loss_bbox: 7.9114, loss: 14.8815 2022-03-21 21:47:13,140 - mmdet - INFO - Epoch [1][30/881] lr: 2.500e-03, eta: 12:37:01, time: 1.936, data_time: 1.203, memory: 1782, loss_cls: 6.8268, loss_bbox: 7.8771, loss: 14.7039 2022-03-21 21:47:43,220 - mmdet - INFO - Epoch [1][40/881] lr: 2.500e-03, eta: 13:43:44, time: 2.914, data_time: 2.131, memory: 1782, loss_cls: 5.4283, loss_bbox: 7.3805, loss: 12.8088 2022-03-21 21:48:02,945 - mmdet - INFO - Epoch [1][50/881] lr: 2.500e-03, eta: 13:24:00, time: 2.067, data_time: 1.381, memory: 1782, loss_cls: 5.2074, loss_bbox: 7.2640, loss: 12.4714 2022-03-21 21:48:19,138 - mmdet - INFO - Epoch [1][60/881] lr: 2.500e-03, eta: 12:44:30, time: 1.619, data_time: 0.859, memory: 1782, loss_cls: 5.2744, loss_bbox: 7.4937, loss: 12.7681 2022-03-21 21:48:36,040 - mmdet - INFO - Epoch [1][70/881] lr: 2.500e-03, eta: 12:19:47, time: 1.690, data_time: 1.008, memory: 1782, loss_cls: 5.6761, loss_bbox: 7.8002, loss: 13.4763 2022-03-21 21:48:55,586 - mmdet - INFO - Epoch [1][80/881] lr: 2.500e-03, eta: 12:12:47, time: 1.955, data_time: 1.255, memory: 1782, loss_cls: 5.1228, loss_bbox: 8.7396, loss: 13.8623 2022-03-21 21:49:16,196 - mmdet - INFO - Epoch [1][90/881] lr: 2.500e-03, eta: 12:11:24, time: 2.060, data_time: 1.383, memory: 1782, loss_cls: 5.6127, loss_bbox: 6.9650, loss: 12.5777

2022-03-21 21:49:29,004 - mmdet - INFO - Epoch [1][100/881] lr: 2.500e-03, eta: 11:42:51, time: 1.281, data_time: 0.631, memory: 1782, loss_cls: 5.2070, loss_bbox: 6.3978, loss: 11.6048 2022-03-21 21:49:45,604 - mmdet - INFO - Epoch [1][110/881] lr: 2.500e-03, eta: 11:31:33, time: 1.660, data_time: 0.957, memory: 1782, loss_cls: 5.5467, loss_bbox: 6.4412, loss: 11.9879 2022-03-21 21:49:58,321 - mmdet - INFO - Epoch [1][120/881] lr: 2.500e-03, eta: 11:10:45, time: 1.272, data_time: 0.534, memory: 1782, loss_cls: 5.1328, loss_bbox: 5.7493, loss: 10.8820 2022-03-21 21:50:16,054 - mmdet - INFO - Epoch [1][130/881] lr: 2.500e-03, eta: 11:06:39, time: 1.774, data_time: 1.048, memory: 1782, loss_cls: 4.5745, loss_bbox: 5.6167, loss: 10.1911 2022-03-21 21:50:31,525 - mmdet - INFO - Epoch [1][140/881] lr: 2.500e-03, eta: 10:57:24, time: 1.547, data_time: 0.810, memory: 1782, loss_cls: 4.9017, loss_bbox: 6.4228, loss: 11.3245 2022-03-21 21:50:45,390 - mmdet - INFO - Epoch [1][150/881] lr: 2.500e-03, eta: 10:45:37, time: 1.386, data_time: 0.842, memory: 1782, loss_cls: 4.7573, loss_bbox: 5.2906, loss: 10.0479 2022-03-21 21:51:03,105 - mmdet - INFO - Epoch [1][160/881] lr: 2.500e-03, eta: 10:43:42, time: 1.771, data_time: 1.122, memory: 1782, loss_cls: 4.5290, loss_bbox: 5.5199, loss: 10.0489 2022-03-21 21:51:23,782 - mmdet - INFO - Epoch [1][170/881] lr: 2.500e-03, eta: 10:47:59, time: 2.064, data_time: 1.287, memory: 1782, loss_cls: 4.8839, loss_bbox: 5.0466, loss: 9.9305 2022-03-21 21:51:40,147 - mmdet - INFO - Epoch [1][180/881] lr: 2.500e-03, eta: 10:43:26, time: 1.636, data_time: 1.056, memory: 1782, loss_cls: 4.4631, loss_bbox: 5.1512, loss: 9.6143 2022-03-21 21:51:57,418 - mmdet - INFO - Epoch [1][190/881] lr: 2.500e-03, eta: 10:41:08, time: 1.733, data_time: 1.045, memory: 1782, loss_cls: 4.4632, loss_bbox: 5.4469, loss: 9.9100 2022-03-21 21:52:14,014 - mmdet - INFO - Epoch [1][200/881] lr: 2.500e-03, eta: 10:37:46, time: 1.661, data_time: 0.895, memory: 1782, loss_cls: nan, loss_bbox: nan, loss: nan 2022-03-21 21:52:14,738 - mmdet - INFO - loss become infinite or NaN!

AssertionError Traceback (most recent call last) /tmp/ipykernel_3063/79739429.py in 23 # Create work_diri 24 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) ---> 25 train_detector(model, datasets, cfg, distributed=False, validate=True) 26

~/Рабочий стол/mmdetection/mmdet/apis/train.py in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta) 206 elif cfg.load_from: 207 runner.load_checkpoint(cfg.load_from) --> 208 runner.run(data_loaders, cfg.workflow)

~/anaconda3/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py in run(self, data_loaders, workflow, max_epochs, kwargs) 125 if mode == 'train' and self.epoch >= self._max_epochs: 126 break --> 127 epoch_runner(data_loaders[i], kwargs) 128 129 time.sleep(1) # wait for some hooks like loggers to finish

~/anaconda3/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py in train(self, data_loader, kwargs) 49 self.call_hook('before_train_iter') 50 self.run_iter(data_batch, train_mode=True, kwargs) ---> 51 self.call_hook('after_train_iter') 52 self._iter += 1 53

~/anaconda3/lib/python3.9/site-packages/mmcv/runner/base_runner.py in call_hook(self, fn_name) 307 """ 308 for hook in self._hooks: --> 309 getattr(hook, fn_name)(self) 310 311 def get_hook_info(self):

~/Рабочий стол/mmdetection/mmdet/core/hook/checkloss_hook.py in after_train_iter(self, runner) 21 def after_train_iter(self, runner): 22 if self.every_n_iters(runner, self.interval): ---> 23 assert torch.isfinite(runner.outputs['loss']), \ 24 runner.logger.info('loss become infinite or NaN!')

Config: input_size = 300 model = dict( type='SingleStageDetector', backbone=dict( type='SSDVGG', depth=16, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')), neck=dict( type='SSDNeck', in_channels=(512, 1024), out_channels=(512, 1024, 512, 256, 256, 256), level_strides=(2, 2, 1, 1), level_paddings=(1, 1, 0, 0), l2_norm_scale=20), bbox_head=dict( type='SSDHead', in_channels=(512, 1024, 512, 256, 256, 256), num_classes=15, anchor_generator=dict( type='SSDAnchorGenerator', scale_major=False, input_size=300, basesize_ratio_range=(0.15, 0.9), strides=[8, 16, 32, 64, 100, 300], ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2])), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), smoothl1_beta=1.0, allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, debug=False), test_cfg=dict( nms_pre=1000, nms=dict(type='nms', iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200)) cudnn_benchmark = True dataset_type = 'DOTA' data_root = 'AdaptedDataset/' img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', mean=[123.675, 116.28, 103.53], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(300, 300), keep_ratio=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(300, 300), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=3, train=dict( type='RepeatDataset', times=5, dataset=dict( type='DOTA', ann_file='train.txt', img_prefix='training/image_2', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', mean=[123.675, 116.28, 103.53], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(300, 300), keep_ratio=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], data_root='AdaptedDataset/')), val=dict( type='DOTA', ann_file='val.txt', img_prefix='training/image_2', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(300, 300), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root='AdaptedDataset/'), test=dict( type='DOTA', ann_file='train.txt', img_prefix='training/image_2', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(300, 300), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], data_root='AdaptedDataset/')) evaluation = dict(interval=1, metric='mAP') optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0005) optimizer_config = dict() lr_config = dict( policy='step', warmup=None, warmup_iters=500, warmup_ratio=0.001, step=[16, 22]) runner = dict(type='EpochBasedRunner', max_epochs=24) checkpoint_config = dict(interval=1) log_config = dict(interval=10, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [ dict(type='NumClassCheckHook'), dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') ] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = 'checkpoints/test.pth' resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' work_dir = 'tutorial_exps' seed = 0 gpu_ids = range(0, 1)

annotations:

import copy import os.path as osp

import mmcv import numpy as np

from mmdet.datasets.builder import DATASETS from mmdet.datasets.custom import CustomDataset

@DATASETS.register_module() class DOTA(CustomDataset):

CLASSES = ('ship','small-vehicle','large-vehicle','plane','harbor','storage-tank','tennis-court','bridge',
           'swimming-pool','helicopter','basketball-court','baseball-diamond','roundabout','soccer-ball-field',
           'ground-track-field')

def load_annotations(self, ann_file):
    cat2label = {k: i for i, k in enumerate(self.CLASSES)}
    # load image list from file
    image_list = mmcv.list_from_file(self.ann_file)

    data_infos = []
    # convert annotations to middle format
    for image_id in image_list:
        filename = f'{self.img_prefix}/{image_id}.png'
        image = mmcv.imread(filename)
        height, width = image.shape[:2]

        data_info = dict(filename=f'{image_id}.png', width=width, height=height)

        # load annotations
        label_prefix = self.img_prefix.replace('image_2', 'label_2')
        lines = mmcv.list_from_file(osp.join(label_prefix, f'{image_id}.txt'))

        content = [line.strip().split(' ') for line in lines]
        bbox_names = [x[8] for x in content]
        bboxes = [[float(info) for info in (x[0:2] + x[4:6])] for x in content]

[:n] + l[n:]

        gt_bboxes = []
        gt_labels = []
        gt_bboxes_ignore = []
        gt_labels_ignore = []

        # filter 'DontCare'
        for bbox_name, bbox in zip(bbox_names, bboxes):
            if bbox_name in cat2label:
                gt_labels.append(cat2label[bbox_name])
                gt_bboxes.append(bbox)
            else:
                gt_labels_ignore.append(-1)
                gt_bboxes_ignore.append(bbox)

        data_anno = dict(
            bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4),
            labels=np.array(gt_labels, dtype=np.long),
            bboxes_ignore=np.array(gt_bboxes_ignore,
                                   dtype=np.float32).reshape(-1, 4),
            labels_ignore=np.array(gt_labels_ignore, dtype=np.long))

        data_info.update(ann=data_anno)
        data_infos.append(data_info)

    return data_infos

open-mmlab / mmdetection

pls help me, i can't train the model on custom dataset. #7488

[:n] + l[n:]