open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.63k stars 9.47k forks source link

Paramwise_cfg not used #6599

Closed pfuerste closed 2 years ago

pfuerste commented 2 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug I am currently testing several versions of Deformable-DETR and would like to tune how much the backbone is trained. For this I am changing the custom_keys in paramwise_cfg of the optimizer (see configs below). After training, I plotted the normed differences of the layer weights between epochs to see if some layers are affected more then others. As it seems, even when setting 'backbone': dict(lr_mult=0.0), the backbone still gets trained. Also, it does not seem to matter at all how I set the parameter, the curve always look about the same. I know there is more going on and setting lr_mult=1 wil not result in a weight differences thats 10 times stronger than if I set it to 0.1, but it should certainly freeze the weights of the backbone if set to 0.0, right? Am I missing some part in my config?

I first tried like this: optimizer = dict( type='AdamW', lr=0.2, weight_decay=0.0001, paramwise_cfg=dict( custom_keys=dict( backbone=dict(lr_mult=0.0), sampling_offsets=dict(lr_mult=0.1), reference_points=dict(lr_mult=0.1))))

Which results in these differences for the layer weights after training for one epoch: 0_0 (x are Layers with names from state-dict, y is the euclidian norm between the pretrained weights and weights after one epoch per layer,each normalized per layer and divided by layer size to make layers comparable.) The blue part should be constant zero in my understanding. After more epochs, all backbone layers will rise, not only the last ones, but I'm still training to get an image for that.

Reproduction

  1. What command or script did you run?

python tools/train.py /home/fuerste/thesis_root/model_compare/models/ddetr/various_tests/ddetr_animal_8_640_backbone_1_e1_warmup.py

  1. Did you make any modifications on the code or config? Did you understand what you have modified? See config
  2. What dataset did you use? Custom cct20, COCO-style annos

Config:

img_scale = (640, 480)
epochs = 1
lr = 0.0002

samples_per_gpu = 4
workers_per_gpu = 2

_base_ = '/home/fuerste/mmdetection/configs/deformable_detr/deformable_detr_r50_16x2_50e_coco.py'

model = dict(
    bbox_head=dict(
        type='DeformableDETRHead',
        num_classes=1))

dataset_type = 'COCODataset'
classes = (
    'animal',
)

# https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py
# Default Decrease all lr by factor 0.1
lr_config = dict(policy='step',
                 step=[int(1)],
                 warmup='linear',
                 warmup_iters=500,
                 warmup_ratio=0.001)

workflow = [('train', 1), ('val', 1)]
runner = dict(type='EpochBasedRunner', max_epochs=epochs)
optimizer = dict(
    type='AdamW',
    lr=lr,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys={
            'backbone': dict(lr_mult=0.0),
            # These layers are "nearly frozen"
            'sampling_offsets': dict(lr_mult=0.1),
            'reference_points': dict(lr_mult=0.1)
        }))

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=img_scale, keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

data = dict(
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=workers_per_gpu,
    train=dict(
        # samples_per_gpu=samples_per_gpu,
        # workers_per_gpu=workers_per_gpu,
        img_prefix='/home/datasets/camera_traps/Caltech-Camera-Traps/CCT20-benchmark/eccv_18_all_images_sm',
        classes=classes,
        filter_empty_gt=False,
        ann_file='/home/fuerste/thesis_root/data/cct20/annotations/one_cat/train_annotations.json',
        pipeline=train_pipeline),
    val=dict(
        # samples_per_gpu=samples_per_gpu,
        # workers_per_gpu=workers_per_gpu,
        img_prefix='/home/datasets/camera_traps/Caltech-Camera-Traps/CCT20-benchmark/eccv_18_all_images_sm',
        classes=classes,
        filter_empty_gt=False,
        separate_eval=True,
        ann_file=['/home/fuerste/thesis_root/data/cct20/annotations/one_cat/cis_val_annotations.json',
                  '/home/fuerste/thesis_root/data/cct20/annotations/one_cat/trans_val_annotations.json'],
        pipeline=test_pipeline),
    test=dict(
        # samples_per_gpu=samples_per_gpu,
        # workers_per_gpu=workers_per_gpu,
        img_prefix='/home/datasets/camera_traps/Caltech-Camera-Traps/CCT20-benchmark/eccv_18_all_images_sm',
        classes=classes,
        filter_empty_gt=False,
        separate_eval=True,
        ann_file=['/home/fuerste/thesis_root/data/cct20/annotations/one_cat/cis_test_annotations.json',
                  '/home/fuerste/thesis_root/data/cct20/annotations/one_cat/trans_test_annotations.json'],
        pipeline=test_pipeline))

log_config = dict(
    interval=10,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(
            type='WandbLoggerHook',
            init_kwargs=dict(
                project='ddetr_test_lr',
                config={},
                tags=["backbone_training 1", "morpheus", "ddetr", "SINGLE_EPOCH", "single_class", str(img_scale), f"batch_size {samples_per_gpu}", f"epochs {epochs}", f"lr {lr}"]
            ))])

load_from = '/home/fuerste/mmdetection/checkpoints/deformable_detr_r50_16x2_50e_coco_20210419_220030-a12b9512.pth'

Minor side-question: Why is the path to or the parameter "load_from" not in the config that gets saved on starting the run?

jshilong commented 2 years ago

There is weight_decay can update the parameters even if you set lr to zero

pfuerste commented 2 years ago

True, I did not think of that. But to verify I set weight decay to 0.0 and backbone=dict(lr_mult=0.0),, but there is still a change of weights in the backbone.

jshilong commented 2 years ago

I test it with the config of detr and set

optimizer = dict(
    type='AdamW',
    lr=0.0001,
    weight_decay=0.0000,
    paramwise_cfg=dict(
        custom_keys={'backbone': dict(lr_mult=0.0, decay_mult=0.0)})

and I do not find the change of parameter

pfuerste commented 2 years ago

Thats strange. I still have weight changes in the backbone, but they are reaaaally small.

jshilong commented 2 years ago

How about testing it with the detr config following the modification as me

leondada commented 2 years ago

Have you addressed this problem? I meet the same bug now...