open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.57k stars 9.46k forks source link

No model can learn anything from my dataset. (But I already trained models outside of MMDetection successfully) #10527

Open mburges-cvl opened 1 year ago

mburges-cvl commented 1 year ago

Hello,

I am new to the MMDetection framework and I would like to train different models on my dataset, to compare their performance with my model. I used this tutorial:

https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-with-customized-datasets

, but none of the selected models (Faster-RCNN, YoloX, Dino, ...) learned anything, every time the COCO Metric looks like this:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 │ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.001 │ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.000 │ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000 │ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 │ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.000 │ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.013 │ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.035 │ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.035 │ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.010 │ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.036 │ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.033

with essentially nothing learned. However, when I train my own model (not in MMDetection) it does work, and I get a mAP@50 of over 0.6. So I don't think the error is in the dataset. This is my config for my dataset:

 import cv2

img_scale = (800, 800)
# We also need to change the num_classes in head to match the dataset's annotation
# model = dict(
#     roi_head=dict(
#         bbox_head=dict(num_classes=1), mask_head=dict(num_classes=1)))

# Modify dataset related settings
data_root = 'path/to/my/dataset'
classes = ("my_custom_object", )

backend_args = None

albu_train_transforms = [
    {
        'type': 'RandomResizedCrop',
        'height': 960,
        'width': 960,
        'scale': (0.5, 1.0),
        'ratio': (1, 1),
        'p': 1
    },
    {
        'type': 'LongestMaxSize',
        'max_size': img_scale,
        'interpolation': cv2.INTER_LINEAR
    },
    {
        'type': 'Rotate',
        'limit': 90,
        'p': 0.3
    },
    {
        'type': 'OneOf',
        'transforms': [
            {
                'type': 'GaussNoise',
                'var_limit': (0, 5000),
                'p': 0.2
            },
            {
                'type': 'Blur',
                'p': 0.2
            },
            {
                'type': 'MedianBlur',
                'p': 0.2
            },
            {
                'type': 'CLAHE',
                'p': 0.2
            },
            {
                'type': 'RandomBrightnessContrast',
                'brightness_limit': 0.2,
                'contrast_limit': 0.2,
                'p': 0.2
            },
            {
                'type': 'RandomGamma',
                'p': 0.2
            },
            {
                'type': 'HueSaturationValue',
                'p': 0.2
            },
            {
                'type': 'Equalize',
                'p': 0.2
            },
            {
                'type': 'ISONoise',
                'p': 0.2
            },
            {
                'type': 'ImageCompression',
                'quality_lower': 75,
                'p': 0.2
            }
        ],
        'p': 0.5
    },
    {
        'type': 'PixelDropout',
        'p': 0.01
    }
]

albu_valid_transforms = [
    {
        'type': 'LongestMaxSize',
        'max_size': 800,
        'interpolation': cv2.INTER_LINEAR
    }
]

dataset_type = 'CocoDataset'

train_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_bboxes_labels', 'gt_ignore_flags'],
            min_visibility=0.3),
        keymap={
            'img': 'image',
            'gt_bboxes': 'bboxes'
        },
        skip_img_without_anno=False),
    dict(type='Resize', scale=img_scale, keep_ratio=True),
    dict(type='PackDetInputs', meta_keys=['img_path', 'img_id', 'height', 'width', 'instances', 'sample_idx', 'img', 'img_shape', 'ori_shape', 'gt_bboxes', 'gt_ignore_flags', 'gt_bboxes_labels', 'scale', 'scale_factor', 'keep_ratio', 'homography_matrix'])
]

val_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Albu',
        transforms=albu_valid_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_bboxes_labels', 'gt_ignore_flags'],
            min_visibility=0.3),
        keymap={
            'img': 'image',
            'gt_bboxes': 'bboxes'
        },
        skip_img_without_anno=False),
    dict(type='Resize', scale=img_scale, keep_ratio=True),
    dict(type='PackDetInputs', meta_keys=['img_path', 'img_id', 'img', 'img_shape', 'ori_shape', 'gt_bboxes', 'gt_ignore_flags', 'gt_bboxes_labels', 'scale', 'scale_factor', 'keep_ratio', 'homography_matrix'])
]

train_dataloader = dict(
    batch_size=8,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        metainfo=dict(classes=classes),
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='train2017/'),
        pipeline=train_pipeline))
val_dataloader = dict(
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        metainfo=dict(classes=classes),
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        pipeline=val_pipeline))
test_dataloader = dict(
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        metainfo=dict(classes=classes),
        data_root=data_root,
        ann_file='annotations/instances_test2017.json',
        data_prefix=dict(img='test2017/'),
        pipeline=val_pipeline))

# Modify metric related settings
val_evaluator = dict(type='CocoMetric',
    ann_file=data_root + 'annotations/instances_val2017.json',
    metric='bbox',
    format_only=False,
    backend_args=backend_args)
test_evaluator = dict(type='CocoMetric',
    ann_file=data_root + 'annotations/instances_test2017.json',
    metric='bbox',
    format_only=False,
    backend_args=backend_args)

And this is the config for my faster-rcnn:

 import cv2

# The new config inherits a base config to highlight the necessary modification
_base_ = [
    '../_base_/models/faster-rcnn_r50_fpn.py',
    './dataset.py', '../_base_/default_runtime.py'
]
# _base_ = ['../faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py', ]

# training schedule for 2x
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=24, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

# learning rate
param_scheduler = [
    dict(
        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
    dict(
        type='MultiStepLR',
        begin=0,
        end=24,
        by_epoch=True,
        milestones=[16, 22],
        gamma=0.1)
]

# Default setting for scaling LR automatically
#   - `enable` means enable scaling LR automatically
#       or not by default.
#   - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=16)

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(
        type='AdamW',
        lr=0.001,  # 0.0002 for DeformDETR
        weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)})
)  # custom_keys contains sampling_offsets and reference_points in DeformDETR  # noqa

load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'

I have tried:

It seems like I am missing an obvious step, but currently out of ideas. Does anybody have an idea?

Thanks!

my_conda_env.txt

AndreaPi commented 1 year ago

Is MMDetection reading your model correctly? The fact that your own model reads it, doesn't mean MMDet is also reading it. I would check dataset reading with and without augmentations. To read the dataset, without having to train a model, you can use either MMDet tool browse_dataset.py (check here https://github.com/open-mmlab/mmdetection/issues/10480#issuecomment-1593338077) or my script, which doesn't write out the augmented images, but it checks train, val and test at the same time (browse_dataset.pyonly checks train) https://github.com/open-mmlab/mmdetection/issues/10525#issuecomment-1594639495. LMK if this helped!

mburges-cvl commented 1 year ago

Hello,

I have tried your code and it outputs the following:

n_images my_object dataset 0 2754 17943 train 1 804 4112 val 2 804 4112 test

which is correct. (Val == Test, here) Also, browse_dataset.py does output the correct images with bounding boxes (on the training set, with and without augmentations).

So I would argue, that the dataset is loaded correctly, got any other idea, why the training does not work?

Thank you for your help!

AndreaPi commented 1 year ago

which is correct. (Val == Test, here) Also, browse_dataset.py does output the correct images with bounding boxes (on the training set, with and without augmentations).

How did you get browse_dataset.py to output the images without augmentations? Did you comment the relevant section(s) in the config file, or is there an argument one can pass it, that tells it to ignore augmentations?

mburges-cvl commented 1 year ago

I commented the relevant sections.

AndreaPi commented 1 year ago

I commented the relevant sections.

Got it, thanks.

So I would argue, that the dataset is loaded correctly, got any other idea, why the training does not work?

I'm sorry, but I can't help you here. However, in this issue you can find a couple RTMDet config files that work for custom datasets. Hopefully you could adapt them to yours.

2Maze commented 1 year ago

Faced a similar problem, probably a problem with the data upload pipeline. Are there any successes?