DetectoRS negative padding on cv2.copyMakeBorder with COCO subset

victoic commented 2 years ago

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug While using DetectoRS with a subset from COCO, I get an error on cv2.copyMakeBorder()

Reproduction

What command or script did you run?

Throught Google Colab, using the MMDet Tutorial as base:

train_detector(model, datasets, cfg, distributed=False, validate=True)

Did you make any modifications on the code or config? Did you understand what you have modified?

Changes to CLASSES value from CocoDataset config to reflect modified annotation. While note relevant to the code, I copied the Pad class from Pipelines to the Colab Notebook for faster debugging. This is visible in the traceback.

What dataset did you use?

A smaller subset from COCO using the same images with different class annotations

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.11 (default, Jul  3 2021, 18:01:19) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla K80
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0+cu111
OpenCV: 4.1.2
MMCV: 1.3.12
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.15.1+b15a2b3

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source] Installed through pip
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.) None

Error traceback If applicable, paste the error trackback here.

loading annotations into memory...
Done (t=0.08s)
creating index...
index created!
{'type': 'CocoSubsetDataset', 'ann_file': 'annotations/COCO-subset.json', 'img_prefix': 'images/train', 'pipeline': [{'type': 'LoadImageFromFile'}, {'type': 'LoadAnnotations', 'with_bbox': True, 'with_mask': True, 'with_seg': True}, {'type': 'Resize', 'img_scale': (1333, 800), 'keep_ratio': True}, {'type': 'RandomFlip', 'flip_ratio': 0.5}, {'type': 'Normalize', 'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375], 'to_rgb': True}, {'type': 'Pad', 'size_divisor': 32}, {'type': 'SegRescale', 'scale_factor': 0.125}, {'type': 'DefaultFormatBundle'}, {'type': 'Collect', 'keys': ['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']}], 'seg_prefix': 'stuffthingmaps/train', 'classes': ('pole', 'cuboid', 'flat', 'disk', 'cylinder', 'sphere', 'wedge'), 'data_root': 'data/coco'}

/content/mmdetection/mmdet/core/anchor/builder.py:17: UserWarning: ``build_anchor_generator`` would be deprecated soon, please use ``build_prior_generator`` 
  '``build_anchor_generator`` would be deprecated soon, please use '
2021-08-24 17:32:08,508 - mmdet - INFO - Start running, host: root@55770538c7a9, work_dir: /content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints
2021-08-24 17:32:08,509 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(NORMAL      ) NumClassCheckHook                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(NORMAL      ) NumClassCheckHook                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2021-08-24 17:32:08,515 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs

loading annotations into memory...
Done (t=0.03s)
creating index...
index created!

---------------------------------------------------------------------------

error                                     Traceback (most recent call last)

<ipython-input-145-3610a9217fdc> in <module>()
     15 # Create work_dir
     16 mmcv.mkdir_or_exist(os.path.abspath(cfg.work_dir))
---> 17 train_detector(model, datasets, cfg, distributed=False, validate=True)

6 frames

/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
    423             # have message field
    424             raise self.exc_type(message=msg)
--> 425         raise self.exc_type(msg)
    426 
    427 

error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/mmdetection/mmdet/datasets/custom.py", line 195, in __getitem__
    if data is None:
  File "<ipython-input-139-5d7fbfb0f378>", line 41, in prepare_train_img
    return self.pipeline(results)
  File "/content/mmdetection/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "<ipython-input-143-b9c9ae8da0e6>", line 97, in __call__
    self._pad_masks(results)
  File "<ipython-input-143-b9c9ae8da0e6>", line 80, in _pad_masks
    results[key] = results[key].pad(pad_shape, pad_val=self.pad_val)
  File "/content/mmdetection/mmdet/core/mask/structures.py", line 310, in pad
    for mask in self.masks
  File "/content/mmdetection/mmdet/core/mask/structures.py", line 310, in <listcomp>
    for mask in self.masks
  File "/usr/local/lib/python3.7/dist-packages/mmcv/image/geometric.py", line 517, in impad
    value=pad_val)
cv2.error: OpenCV(4.1.2) /io/opencv/modules/core/src/copy.cpp:1170: error: (-215:Assertion failed) top >= 0 && bottom >= 0 && left >= 0 && right >= 0 && _src.dims() <= 2 in function 'copyMakeBorder'

As I've seem in the previous cv2.copyMakeBorder errors posted here, this is likely due to data/annotations issue. However I can't locate it, since I'm using unmodified images from COCO and the COCO annotation with only the category_id changed. The padding is resulting in negative values, which break the assert, but I can't find where is this originating.

hhaAndroid commented 2 years ago

@victoic Please post your configuration.

victoic commented 2 years ago

As printed by pretty_text

Config:
dataset_type = 'CocoSubsetDataset'
data_root = 'data/coco'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='SegRescale', scale_factor=0.125),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=0,
    train=dict(
        type='CocoSubsetDataset',
        ann_file='annotations/COCO-subset.json',
        img_prefix='images/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='LoadAnnotations',
                with_bbox=True,
                with_mask=True,
                with_seg=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='SegRescale', scale_factor=0.125),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=[
                    'img', 'gt_bboxes', 'gt_labels', 'gt_masks',
                    'gt_semantic_seg'
                ])
        ],
        seg_prefix='stuffthingmaps/train',
        classes=('1', '2', '3', '4', '5', '6',
                 '7'),
        data_root='data/coco'),
    val=dict(
        type='CocoSubsetDataset',
        ann_file='annotations/COCO-subset.json',
        img_prefix='images/eval',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip', flip_ratio=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('1', '2', '3', '4', '5', '6',
                 '7'),
        data_root='data/coco'),
    test=dict(
        type='CocoSubsetDataset',
        ann_file='annotations/COCO-subset.json',
        img_prefix='images/eval',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip', flip_ratio=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('1', '2', '3', '4', '5', '6',
                 '7'),
        data_root='data/coco'))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
    type='HybridTaskCascade',
    backbone=dict(
        type='DetectoRS_ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
        conv_cfg=dict(type='ConvAWS'),
        sac=dict(type='SAC', use_deform=True),
        stage_with_sac=(False, True, True, True),
        output_img=True),
    neck=dict(
        type='RFP',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5,
        rfp_steps=2,
        aspp_out_channels=64,
        aspp_dilations=(1, 3, 6, 1),
        rfp_backbone=dict(
            rfp_inplanes=256,
            type='DetectoRS_ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            conv_cfg=dict(type='ConvAWS'),
            sac=dict(type='SAC', use_deform=True),
            stage_with_sac=(False, True, True, True),
            pretrained='torchvision://resnet50',
            style='pytorch')),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='HybridTaskCascadeRoIHead',
        interleaved=True,
        mask_info_flow=True,
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(type='Shared2FCBBoxHead', num_classes=7),
            dict(type='Shared2FCBBoxHead', num_classes=7),
            dict(type='Shared2FCBBoxHead', num_classes=7)
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=[
            dict(
                type='HTCMaskHead',
                with_conv_res=False,
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=7,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=7,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=7,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))
        ],
        semantic_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[8]),
        semantic_head=dict(
            type='FusedSemanticHead',
            num_ins=5,
            fusion_level=1,
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=183,
            ignore_label=255,
            loss_weight=0.2)),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.001,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
classes = ('1', '2', '3', '4', '5', '6', '7')
work_dir = '/content/gdrive/MyDrive/COCO Subset/DetectoRS-master/checkpoints'
seed = 0
gpu_ids = range(0, 1)

victoic commented 2 years ago

I don't really know if it helps. But I've printed: a) the image from the results['filename']; b) the segmented image from stuffthingmaps, using results['seg_prefix']+'/'+results['ann_info']['seg_map']; c) the mask passed as parameter to mmcv.impad()

I printed both a) and b) from inside the _pad_masks method of the Pad pipeline class, while c) is printed inside the _pad function of the BitmapMasks class. Hope this is helpful. (This is for file 000000279522.jpg, using seed 0)

victoic commented 2 years ago

Another update. So I believe I encircled the problem to be the annotations file, but I still don't know exactly what. Using the same data structure with the annotation files from the original COCO Dataset results in no error, when the file is changed to the subset annotation (created through VIA exportation) the error occurs.

I still have no clue what could be causing it, since the the mask/bbox values are the same as the original COCO annotation.

AronLin commented 2 years ago

Another update. So I believe I encircled the problem to be the annotations file, but I still don't know exactly what. Using the same data structure with the annotation files from the original COCO Dataset results in no error, when the file is changed to the subset annotation (created through VIA exportation) the error occurs.

I still have no clue what could be causing it, since the the mask/bbox values are the same as the original COCO annotation.

I suggest you convert the format exported by VIA to COCO format. It is difficult for me to judge where the problem is based on the current information.

victoic commented 2 years ago

Okay, I solved the problem. Don't know exactly what was it, but let me explain.

I suggest you convert the format exported by VIA to COCO format. It is difficult for me to judge where the problem is based on the current information.

The data was always in the COCO format, the VIA tool has a option to export annotations in the COCO format, which is what I used to create the subset. But something was indeed wrong with the file, I manually generated the subset using a script to filter the original COCO annotations file and the problem was solved.

I attach two files in this response coco_subset_train.json.txt and COCO-subset-train.json.txt, the latter is the annotation file generated by VIA, the former is the one created by filtering, in case anyone is curious to find out what is different about those, I couldn't find it. They are .txt because .json are not allowed here.

However I've noticed two new issues, I'm wondering if I should open new issues or could use this one, those are:

Train losses remain as nan using DetectoRS, this happens even when using the original COCO annotations. I couldn't finish one epoch with the full COCO dataset due to Colab time out. But here is the logger on the last iteration before time out:

2021-09-01 09:06:21,235 - mmdet - INFO - Epoch [1][30400/58633] lr: 2.000e-02, eta: 17 days, 3:12:27, time: 2.195, data_time: 0.116, memory: 11767, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 25.9840, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 25.9840, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 25.9840, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan

After loading weights from https://github.com/open-mmlab/mmdetection/tree/master/configs/detectors, I get RuntimeError: CUDA error: device-side assert triggered. Without this the train_detector runs (with only issue 1). I load weights to some layers, using:

# Load and filter state_dict
loaded_dict = torch.load('/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/model/detectors_htc_r50_1x_coco-329b1453.pth')
del_keys = []
for k in loaded_dict['state_dict'].keys():
  if 'head' in k:
    del_keys.append(k)
for k in del_keys:
  loaded_dict['state_dict'].pop(k)
model.load_state_dict(loaded_dict['state_dict'], strict=False)

Also, when an epoch is finished and validation is run I get: mmdet - ERROR - The testing results of the whole dataset is empty. I'd like to point out again that I'm using this annotation, which is simply a subset of the COCO Dataset. Why would this happen?

AronLin commented 2 years ago

I visualized both JSON file, but it seems that the label in coco_subset_train.json is wrong. Instead, I successfully load the annotations with COCO-subset-train.json and the problem you mentioned did not appear.

For the losses, you can add grad_clip in the config file.

optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))

For the loading weights, you can set its map_loaction.

victoic commented 2 years ago

First I would like to thank you for your time and help, since I'm new to mmdetection I may be understanding things wrong and your guidance has been clearing a lot for me.

1. I visualized both JSON file, but it seems that the label in `coco_subset_train.json` is wrong.  Instead, I successfully load the annotations with `COCO-subset-train.json` and the problem you mentioned did not appear.

I'm sorry, this is my mistake. I failed to mention coco_subset_train.json does not use all of the the classes. It uses the following: CLASSES = ['backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'toilet', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'toaster', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

However, the annotations within both files are identical, as I could check by executing the following code:

>>> f1 = "coco_subset_train.json"
>>> f2 = "COCO-subset-train.json"
>>> file1 = open(f1,'r')
>>> file2 = open(f2,'r')
>>> j1 = json.load(file1)
>>> j2 = json.load(file2)
>>> anns1 = j1['annotations']
>>> anns2 = j2['annotations']
>>> print(len(anns1), len(anns2))
6597 6662
>>> anns1_byId = {ann['id']: ann for ann in anns1}
>>> anns2_byId = {ann['id']: ann for ann in anns2}
>>> equals_segmentation = 0
>>> equals_bbox = 0
>>> equals_category_id = 0
>>> for k in anns1_byId.keys():
...    a1 = anns1_byId[k]
...    a2 = anns2_byId[k]
...    if a1['segmentation'] == a2['segmentation']:
...       equals_segmentation+=1
...    if a1['bbox'] == a2['bbox']:
...       equals_bbox+=1
...    if a1['category_id'] == a2['category_id']:
...       equals_category_id+=1
...
>>> print(equals_segmentation)
6597
>>> print(equals_bbox)
6597
>>> print(equals_category_id)
6597

2. For the losses, you can add grad_clip in the config file.
   ```
    optimizer_config = dict(
        _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
   ```

After adding this to the config file I get the following error:

/content/mmdetection/mmdet/apis/train.py in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta)
    126     runner.register_training_hooks(cfg.lr_config, optimizer_config,
    127                                    cfg.checkpoint_config, cfg.log_config,
--> 128                                    cfg.get('momentum_config', None))
    129     if distributed:
    130         if isinstance(runner, EpochBasedRunner):

/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py in register_training_hooks(self, lr_config, optimizer_config, checkpoint_config, log_config, momentum_config, timer_config, custom_hooks_config)
    536         will be triggered after default hooks.
    537         """
--> 538         self.register_lr_hook(lr_config)
    539         self.register_momentum_hook(momentum_config)
    540         self.register_optimizer_hook(optimizer_config)

/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py in register_lr_hook(self, lr_config)
    418         else:
    419             hook = lr_config
--> 420         self.register_hook(hook, priority='VERY_HIGH')
    421 
    422     def register_momentum_hook(self, momentum_config):

/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py in register_hook(self, hook, priority)
    266                 Lower value means higher priority.
    267         """
--> 268         assert isinstance(hook, Hook)
    269         if hasattr(hook, 'priority'):
    270             raise ValueError('"priority" is a reserved attribute for hooks')

AssertionError:

3. For the loading weights, you can set its `map_loaction`.

I'm aware of the map_location parameter, however the error does not occur during the loading of the weights, but when the training starts. I'll try it as soon as I can and will report back.

But most importantly, do you have any insight on why would the test metrics appear as mmdet - ERROR - The testing results of the whole dataset is empty.?

AronLin commented 2 years ago

But most importantly, do you have any insight on why would the test metrics appear as mmdet - ERROR - The testing results of the whole dataset is empty.?

This error occurs at L438 in mmdet/datasets/coco.py, you can check what happened.

After adding this to the config file I get the following error:

This error shouldn't happen.

I noticed that you used your own dataset, why not use CocoDataset directly.

victoic commented 2 years ago

This error occurs at L438 in mmdet/datasets/coco.py, you can check what happened.

I've print some of the variable through evaluate to check what could be and it seems results is entering the evaluate() method empty. I also noticed that indeed my model is returning an empty list in L28 of mmdet/apis/test.py.

Json Prefix:  /content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset

Metrics:  ['bbox', 'segm']

Result Files:  {'bbox': '/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset.bbox.json', 'proposal': '/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset.bbox.json', 'segm': '/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset.segm.json'}

Eval Results:  OrderedDict()

Results:  [([array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32)], [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]), #I'm trucating this because it's too long but it just repeats)]

But the input Data seems to be ok:

Data:  {'img_metas': [DataContainer([[{'filename': 'data/coco/images/eval/000000073922.jpg', 'ori_filename': '000000073922.jpg', 'ori_shape': (491, 640, 3), 'img_shape': (800, 1043, 3), 'pad_shape': (800, 1056, 3), 'scale_factor': array([1.6296875, 1.6293279, 1.6296875, 1.6293279], dtype=float32), 'flip': False, 'flip_direction': None, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}, 'batch_input_shape': (800, 1056)}]])], 'img': [tensor([[[[-0.3198, -0.2513, -0.1657,  ...,  0.0000,  0.0000,  0.0000],
          [-0.3198, -0.2684, -0.1999,  ...,  0.0000,  0.0000,  0.0000],
          [-0.3027, -0.2856, -0.2342,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.9980, -2.0152, -2.0152,  ...,  0.0000,  0.0000,  0.0000],
          [-1.9980, -1.9980, -2.0152,  ...,  0.0000,  0.0000,  0.0000],
          [-1.9809, -1.9980, -2.0152,  ...,  0.0000,  0.0000,  0.0000]],

         [[-0.2150, -0.1450, -0.0399,  ...,  0.0000,  0.0000,  0.0000],
          [-0.2150, -0.1450, -0.0749,  ...,  0.0000,  0.0000,  0.0000],
          [-0.1975, -0.1625, -0.1099,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.5455, -1.5630, -1.5630,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5280, -1.5455, -1.5630,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5280, -1.5455, -1.5630,  ...,  0.0000,  0.0000,  0.0000]],

         [[-0.5147, -0.4798, -0.4101,  ...,  0.0000,  0.0000,  0.0000],
          [-0.5147, -0.4798, -0.4450,  ...,  0.0000,  0.0000,  0.0000],
          [-0.4973, -0.4973, -0.4798,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.4036, -1.4036, -1.4210,  ...,  0.0000,  0.0000,  0.0000],
          [-1.4036, -1.4036, -1.4210,  ...,  0.0000,  0.0000,  0.0000],
          [-1.3861, -1.4036, -1.4210,  ...,  0.0000,  0.0000,  0.0000]]]])]}

This error shouldn't happen.

I noticed that you used your own dataset, why not use CocoDataset directly.

I changed the dataset class to CocoDataset and fixed the problem with optimizer_config, I believe the problem was I was adding _delete_ = True during execution, not directly at the config file. However, after the changes the loss remains as nan.

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/content/mmdetection/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/content/mmdetection/mmdet/core/anchor/anchor_generator.py:361: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  '``single_level_grid_anchors`` would be deprecated soon. '
2021-09-05 15:28:18,707 - mmdet - INFO - Epoch [1][50/501]  lr: 1.978e-03, eta: 3:26:02, time: 2.074, data_time: 0.148, memory: 10091, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.7206, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.7206, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.7206, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:30:05,031 - mmdet - INFO - Epoch [1][100/501] lr: 3.976e-03, eta: 3:26:55, time: 2.126, data_time: 0.124, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.0023, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.0023, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.0023, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:31:54,955 - mmdet - INFO - Epoch [1][150/501] lr: 5.974e-03, eta: 3:28:22, time: 2.199, data_time: 0.106, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.1538, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.1538, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.1538, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:33:41,815 - mmdet - INFO - Epoch [1][200/501] lr: 7.972e-03, eta: 3:26:42, time: 2.137, data_time: 0.100, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.0952, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.0952, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.0952, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:35:26,944 - mmdet - INFO - Epoch [1][250/501] lr: 9.970e-03, eta: 3:24:19, time: 2.103, data_time: 0.101, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.3275, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.3275, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.3275, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:37:13,353 - mmdet - INFO - Epoch [1][300/501] lr: 1.197e-02, eta: 3:22:33, time: 2.128, data_time: 0.100, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.8386, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.8386, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.8386, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:39:00,732 - mmdet - INFO - Epoch [1][350/501] lr: 1.397e-02, eta: 3:21:03, time: 2.148, data_time: 0.101, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.1250, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.1250, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.1250, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:40:49,694 - mmdet - INFO - Epoch [1][400/501] lr: 1.596e-02, eta: 3:19:50, time: 2.179, data_time: 0.103, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.3214, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.3214, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.3214, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:42:35,815 - mmdet - INFO - Epoch [1][450/501] lr: 1.796e-02, eta: 3:17:55, time: 2.122, data_time: 0.098, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.5714, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.5714, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.5714, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:44:23,355 - mmdet - INFO - Epoch [1][500/501] lr: 1.996e-02, eta: 3:16:17, time: 2.151, data_time: 0.104, memory: 10263, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.4469, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.4469, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.4469, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
2021-09-05 15:44:25,330 - mmdet - INFO - Saving checkpoint at 1 epochs

I'm truly at a loss on what could be happening here. My datasets seems to be ok, you can see as I printed it: Train:

CocoDataset Train dataset with number of images 1001, and instance counts: 
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+
| category          | count | category            | count | category        | count | category        | count | category           | count |
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+
| 0 [backpack]      | 95    | 1 [umbrella]        | 102   | 2 [handbag]     | 114   | 3 [tie]         | 63    | 4 [suitcase]       | 80    |
| 5 [frisbee]       | 23    | 6 [skis]            | 70    | 7 [snowboard]   | 36    | 8 [sports ball] | 72    | 9 [kite]           | 109   |
| 10 [baseball bat] | 41    | 11 [baseball glove] | 52    | 12 [skateboard] | 37    | 13 [surfboard]  | 44    | 14 [tennis racket] | 34    |
| 15 [bottle]       | 577   | 16 [wine glass]     | 202   | 17 [cup]        | 479   | 18 [fork]       | 130   | 19 [knife]         | 178   |
| 20 [spoon]        | 152   | 21 [bowl]           | 296   | 22 [banana]     | 197   | 23 [apple]      | 124   | 24 [sandwich]      | 117   |
| 25 [orange]       | 127   | 26 [broccoli]       | 192   | 27 [carrot]     | 237   | 28 [hot dog]    | 85    | 29 [pizza]         | 76    |
| 30 [donut]        | 146   | 31 [cake]           | 120   | 32 [chair]      | 670   | 33 [couch]      | 117   | 34 [potted plant]  | 126   |
| 35 [bed]          | 55    | 36 [toilet]         | 34    | 37 [laptop]     | 107   | 38 [mouse]      | 62    | 39 [remote]        | 94    |
| 40 [keyboard]     | 70    | 41 [cell phone]     | 57    | 42 [toaster]    | 23    | 43 [book]       | 427   | 44 [clock]         | 65    |
| 45 [vase]         | 109   | 46 [scissors]       | 51    | 47 [teddy bear] | 56    | 48 [hair drier] | 20    | 49 [toothbrush]    | 46    |
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+]

Test/Val:

CocoDataset Train dataset with number of images 492, and instance counts: 
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+
| category          | count | category            | count | category        | count | category        | count | category           | count |
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+
| 0 [backpack]      | 55    | 1 [umbrella]        | 75    | 2 [handbag]     | 70    | 3 [tie]         | 26    | 4 [suitcase]       | 26    |
| 5 [frisbee]       | 14    | 6 [skis]            | 33    | 7 [snowboard]   | 13    | 8 [sports ball] | 65    | 9 [kite]           | 9     |
| 10 [baseball bat] | 39    | 11 [baseball glove] | 37    | 12 [skateboard] | 8     | 13 [surfboard]  | 29    | 14 [tennis racket] | 28    |
| 15 [bottle]       | 225   | 16 [wine glass]     | 50    | 17 [cup]        | 260   | 18 [fork]       | 76    | 19 [knife]         | 132   |
| 20 [spoon]        | 80    | 21 [bowl]           | 161   | 22 [banana]     | 62    | 23 [apple]      | 108   | 24 [sandwich]      | 51    |
| 25 [orange]       | 77    | 26 [broccoli]       | 39    | 27 [carrot]     | 89    | 28 [hot dog]    | 29    | 29 [pizza]         | 40    |
| 30 [donut]        | 58    | 31 [cake]           | 40    | 32 [chair]      | 326   | 33 [couch]      | 59    | 34 [potted plant]  | 58    |
| 35 [bed]          | 26    | 36 [toilet]         | 20    | 37 [laptop]     | 49    | 38 [mouse]      | 29    | 39 [remote]        | 51    |
| 40 [keyboard]     | 35    | 41 [cell phone]     | 51    | 42 [toaster]    | 11    | 43 [book]       | 262   | 44 [clock]         | 33    |
| 45 [vase]         | 51    | 46 [scissors]       | 23    | 47 [teddy bear] | 35    | 48 [hair drier] | 10    | 49 [toothbrush]    | 32    |
+-------------------+-------+---------------------+-------+-----------------+-------+-----------------+-------+--------------------+-------+

I've redone the annotation files using only pycocotools, I'm uploading the files here again (coco_subset_eval.txt, coco_subset_train.txt). Here is my current config, so it can be verified that everything is ok:

Config:
dataset_type = 'CocoDataset'
data_root = 'data/coco'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='SegRescale', scale_factor=0.125),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=0,
    train=dict(
        type='CocoDataset',
        ann_file='annotations/coco_subset_train.json',
        img_prefix='images/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='LoadAnnotations',
                with_bbox=True,
                with_mask=True,
                with_seg=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='SegRescale', scale_factor=0.125),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=[
                    'img', 'gt_bboxes', 'gt_labels', 'gt_masks',
                    'gt_semantic_seg'
                ])
        ],
        seg_prefix='stuffthingmaps/train',
        classes=('backpack', 'umbrella', 'handbag', 'tie', 'suitcase',
                 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
                 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
                 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
                 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
                 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut',
                 'cake', 'chair', 'couch', 'potted plant', 'bed', 'toilet',
                 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
                 'toaster', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
                 'hair drier', 'toothbrush'),
        data_root='data/coco'),
    val=dict(
        type='CocoDataset',
        ann_file='annotations/coco_subset_eval.json',
        img_prefix='images/eval',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip', flip_ratio=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('backpack', 'umbrella', 'handbag', 'tie', 'suitcase',
                 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
                 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
                 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
                 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
                 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut',
                 'cake', 'chair', 'couch', 'potted plant', 'bed', 'toilet',
                 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
                 'toaster', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
                 'hair drier', 'toothbrush'),
        data_root='data/coco'),
    test=dict(
        type='CocoDataset',
        ann_file='annotations/coco_subset_eval.json',
        img_prefix='images/eval',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip', flip_ratio=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('backpack', 'umbrella', 'handbag', 'tie', 'suitcase',
                 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
                 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
                 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
                 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
                 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut',
                 'cake', 'chair', 'couch', 'potted plant', 'bed', 'toilet',
                 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
                 'toaster', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
                 'hair drier', 'toothbrush'),
        data_root='data/coco'))
evaluation = dict(
    metric=['bbox', 'segm'],
    by_epoch=True,
    jsonfile_prefix=
    '/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset'
)
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
    type='HybridTaskCascade',
    backbone=dict(
        type='DetectoRS_ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
        conv_cfg=dict(type='ConvAWS'),
        sac=dict(type='SAC', use_deform=True),
        stage_with_sac=(False, True, True, True),
        output_img=True),
    neck=dict(
        type='RFP',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5,
        rfp_steps=2,
        aspp_out_channels=64,
        aspp_dilations=(1, 3, 6, 1),
        rfp_backbone=dict(
            rfp_inplanes=256,
            type='DetectoRS_ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            conv_cfg=dict(type='ConvAWS'),
            sac=dict(type='SAC', use_deform=True),
            stage_with_sac=(False, True, True, True),
            pretrained='torchvision://resnet50',
            style='pytorch')),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='HybridTaskCascadeRoIHead',
        interleaved=True,
        mask_info_flow=True,
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(type='Shared2FCBBoxHead', num_classes=50),
            dict(type='Shared2FCBBoxHead', num_classes=50),
            dict(type='Shared2FCBBoxHead', num_classes=50)
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=[
            dict(
                type='HTCMaskHead',
                with_conv_res=False,
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=50,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=50,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
            dict(
                type='HTCMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=50,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))
        ],
        semantic_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[8]),
        semantic_head=dict(
            type='FusedSemanticHead',
            num_ins=5,
            fusion_level=1,
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=183,
            loss_seg=dict(
                type='CrossEntropyLoss', ignore_index=255, loss_weight=0.2))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.001,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
classes = ('backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
           'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
           'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
           'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
           'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
           'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
           'potted plant', 'bed', 'toilet', 'laptop', 'mouse', 'remote',
           'keyboard', 'cell phone', 'toaster', 'book', 'clock', 'vase',
           'scissors', 'teddy bear', 'hair drier', 'toothbrush')
work_dir = '/content/gdrive/MyDrive/Doutorado/COCO Subset/DetectoRS-master/checkpoints/subset'
seed = 0
gpu_ids = range(0, 1)

victoic commented 2 years ago

Hello, I'm bringing another update. I have done many tests to try and find a solution for my problem. But none were successful. One odd event was when I trained the model with the pretrained checkpoint. I got the following log:

2021-09-13 17:51:02,460 - mmdet - INFO - Epoch [1][50/1001] lr: 1.978e-03, eta: 13:30:57, time: 4.068, data_time: 0.094, memory: 7942, loss_rpn_cls: 0.0244, loss_rpn_bbox: 0.0124, loss_semantic_seg: 0.1777, s0.loss_cls: 1.4451, s0.acc: 74.4570, s0.loss_bbox: 0.0751, s0.loss_mask: 0.6582, s1.loss_cls: 0.8104, s1.acc: 71.0209, s1.loss_bbox: 0.0830, s1.loss_mask: 0.3728, s2.loss_cls: 0.4637, s2.acc: 66.1233, s2.loss_bbox: 0.0558, s2.loss_mask: 0.1852, loss: 4.3637
2021-09-13 17:54:17,287 - mmdet - INFO - Epoch [1][100/1001]    lr: 3.976e-03, eta: 13:10:34, time: 3.897, data_time: 0.049, memory: 7942, loss_rpn_cls: 0.0309, loss_rpn_bbox: 0.0161, loss_semantic_seg: 0.2387, s0.loss_cls: 0.5302, s0.acc: 89.2500, s0.loss_bbox: 0.0579, s0.loss_mask: 0.5119, s1.loss_cls: 0.3080, s1.acc: 87.5830, s1.loss_bbox: 0.0676, s1.loss_mask: 0.2740, s2.loss_cls: 0.1627, s2.acc: 87.4562, s2.loss_bbox: 0.0447, s2.loss_mask: 0.1403, loss: 2.3830
2021-09-13 17:57:20,034 - mmdet - INFO - Epoch [1][150/1001]    lr: 5.974e-03, eta: 12:45:41, time: 3.655, data_time: 0.056, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 75.4767, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 75.0148, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 75.4754, s2.loss_bbox: nan, s2.loss_mask: 5.8979, loss: nan
2021-09-13 17:59:42,359 - mmdet - INFO - Epoch [1][200/1001]    lr: 7.972e-03, eta: 11:51:56, time: 2.847, data_time: 0.054, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.6500, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.6500, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.6500, s2.loss_bbox: nan, s2.loss_mask: 0.7845, loss: nan
2021-09-13 18:02:05,137 - mmdet - INFO - Epoch [1][250/1001]    lr: 9.970e-03, eta: 11:19:06, time: 2.856, data_time: 0.057, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.3833, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.3833, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.3833, s2.loss_bbox: nan, s2.loss_mask: 0.1729, loss: nan
2021-09-13 18:04:27,156 - mmdet - INFO - Epoch [1][300/1001]    lr: 1.197e-02, eta: 10:55:55, time: 2.840, data_time: 0.052, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 3.9429, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 3.9429, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 3.9429, s2.loss_bbox: nan, s2.loss_mask: 0.1725, loss: nan
2021-09-13 18:06:50,738 - mmdet - INFO - Epoch [1][350/1001]    lr: 1.397e-02, eta: 10:39:33, time: 2.872, data_time: 0.055, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.4000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.4000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.4000, s2.loss_bbox: nan, s2.loss_mask: 0.1725, loss: nan
2021-09-13 18:09:13,892 - mmdet - INFO - Epoch [1][400/1001]    lr: 1.596e-02, eta: 10:26:28, time: 2.863, data_time: 0.055, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.0000, s2.loss_bbox: nan, s2.loss_mask: 0.1718, loss: nan
2021-09-13 18:11:38,207 - mmdet - INFO - Epoch [1][450/1001]    lr: 1.796e-02, eta: 10:16:15, time: 2.886, data_time: 0.054, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.3750, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.3750, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.3750, s2.loss_bbox: nan, s2.loss_mask: 0.1719, loss: nan
2021-09-13 18:13:58,708 - mmdet - INFO - Epoch [1][500/1001]    lr: 1.996e-02, eta: 10:06:09, time: 2.810, data_time: 0.051, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 0.4000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 0.4000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 0.4000, s2.loss_bbox: nan, s2.loss_mask: 0.1713, loss: nan
2021-09-13 18:16:19,442 - mmdet - INFO - Epoch [1][550/1001]    lr: 2.000e-02, eta: 9:57:32, time: 2.815, data_time: 0.050, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.3000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.3000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.3000, s2.loss_bbox: nan, s2.loss_mask: 0.1716, loss: nan
2021-09-13 18:18:43,594 - mmdet - INFO - Epoch [1][600/1001]    lr: 2.000e-02, eta: 9:51:03, time: 2.883, data_time: 0.057, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.2988, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.2988, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.2988, s2.loss_bbox: nan, s2.loss_mask: 0.1708, loss: nan
2021-09-13 18:21:07,244 - mmdet - INFO - Epoch [1][650/1001]    lr: 2.000e-02, eta: 9:45:02, time: 2.873, data_time: 0.054, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.9333, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.9333, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.9333, s2.loss_bbox: nan, s2.loss_mask: 0.1702, loss: nan
2021-09-13 18:23:30,588 - mmdet - INFO - Epoch [1][700/1001]    lr: 2.000e-02, eta: 9:39:28, time: 2.867, data_time: 0.054, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.0667, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.0667, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.0667, s2.loss_bbox: nan, s2.loss_mask: 0.1702, loss: nan
2021-09-13 18:25:50,656 - mmdet - INFO - Epoch [1][750/1001]    lr: 2.000e-02, eta: 9:33:30, time: 2.801, data_time: 0.050, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.3000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.3000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.3000, s2.loss_bbox: nan, s2.loss_mask: 0.1701, loss: nan
2021-09-13 18:28:12,592 - mmdet - INFO - Epoch [1][800/1001]    lr: 2.000e-02, eta: 9:28:25, time: 2.839, data_time: 0.052, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 3.9000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 3.9000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 3.9000, s2.loss_bbox: nan, s2.loss_mask: 0.1696, loss: nan
2021-09-13 18:30:35,744 - mmdet - INFO - Epoch [1][850/1001]    lr: 2.000e-02, eta: 9:23:56, time: 2.863, data_time: 0.055, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 2.4018, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 2.4018, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 2.4018, s2.loss_bbox: nan, s2.loss_mask: 0.1692, loss: nan
2021-09-13 18:32:59,935 - mmdet - INFO - Epoch [1][900/1001]    lr: 2.000e-02, eta: 9:19:53, time: 2.884, data_time: 0.055, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 3.3922, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 3.3922, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 3.3922, s2.loss_bbox: nan, s2.loss_mask: 0.1688, loss: nan
2021-09-13 18:35:22,486 - mmdet - INFO - Epoch [1][950/1001]    lr: 2.000e-02, eta: 9:15:42, time: 2.851, data_time: 0.052, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 3.0476, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 3.0476, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 3.0476, s2.loss_bbox: nan, s2.loss_mask: 0.1686, loss: nan
2021-09-13 18:37:45,852 - mmdet - INFO - Epoch [1][1000/1001]   lr: 2.000e-02, eta: 9:11:50, time: 2.867, data_time: 0.059, memory: 7942, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 1.6538, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 1.6538, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 1.6538, s2.loss_bbox: nan, s2.loss_mask: 0.1673, loss: nan
2021-09-13 18:37:48,574 - mmdet - INFO - Saving checkpoint at 1 epochs

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 492/492, 1.3 task/s, elapsed: 376s, ETA:     0s

2021-09-13 18:44:08,514 - mmdet - INFO - Evaluating bbox...
2021-09-13 18:44:08,518 - mmdet - ERROR - The testing results of the whole dataset is empty.
2021-09-13 18:44:08,533 - mmdet - INFO - Epoch(val) [1][492]

Note the first two iterations have loss statistics and then it vanishes again. Also the test continues to return "results of the whole dataset is empty". I saw in another issue that this problem could happen in case of box/polygons out of the image, then I checked my annotation again using the following code:

  print("Loaded COCO annotation file with {} annotations".format(len(anns)))
  failures = []
  for ann in anns:
    image = coco.loadImgs(ann['image_id'])[0]

    h = image['height']
    w = image['width']
    for segm in ann['segmentation']:
      for i in range(0, len(segm), 2):
        #print(segm[i], segm[i+1], "-----", image['id'], h, w)
        if (segm[i] < 0 or segm[i] > w) or (segm[i+1] < 0 or segm[i+1] > h):
          print("Annotation out of image!")
          print("\tAnnotation {}: {}\n\tImage {}: {}".format(ann['id'], ann, image['id'], image))
          failures.append(ann['id'])
  print("{} annotations with failures.".format(len(failures)))

Which returns:

Loaded COCO annotation file with 3265 annotations
0 annotations with failures.

So I believe this rules out the boundaries problem. Any of you got any insight?

AronLin commented 2 years ago

The testing results of the whole dataset is empty. Your model is not correct, so it outputs nothing. You should find why your model cant be trained normally. I found that loss_semantic_seg is increasing during training. I suggest you to train your model without semantic seg to ensure that your instance part of model is normal.

victoic commented 2 years ago

You are correct. But I made no changes to the model, except for class output. Simply changing the config from DetectoRS to other, Deformable DETR, trains correctly. With no change to data. Which points to the problem being something with DetectoRS.

I suggest you to train your model without semantic seg to ensure that your instance part of model is normal.

Is there a Config option for this?

AronLin commented 2 years ago

Deformable DETR has not used the semantic head so it can be trained correctly. You can train your model with htc_without_semantic. If it is trained normally, you can try to train a htc model with semantic segmentation. If it can be trained normally too, there is somehing wrong with DetectoRS. If not, the semantic annotations might be not right, then you can check it.

victoic commented 2 years ago

Should changing my coco annotation file should be accompanied by a change in semantic annotation? If not, then I don't think it could be that, since I do not change the semantic segmentation annotation, as could be seen in the config.

I'm closing the issue for now, however. I was able to train using other models. As soon as I'm avaiable to test DetectoRS again I'll come back to this issue. Thank you again @AronLin, for your time and help.

open-mmlab / mmdetection

DetectoRS negative padding on cv2.copyMakeBorder with COCO subset #5944