Closed pfuerste closed 2 years ago
There is weight_decay can update the parameters even if you set lr to zero
True, I did not think of that.
But to verify I set weight decay to 0.0 and backbone=dict(lr_mult=0.0),
, but there is still a change of weights in the backbone.
I test it with the config of detr and set
optimizer = dict(
type='AdamW',
lr=0.0001,
weight_decay=0.0000,
paramwise_cfg=dict(
custom_keys={'backbone': dict(lr_mult=0.0, decay_mult=0.0)})
and I do not find the change of parameter
Thats strange. I still have weight changes in the backbone, but they are reaaaally small.
How about testing it with the detr config following the modification as me
Have you addressed this problem? I meet the same bug now...
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug I am currently testing several versions of Deformable-DETR and would like to tune how much the backbone is trained. For this I am changing the custom_keys in paramwise_cfg of the optimizer (see configs below). After training, I plotted the normed differences of the layer weights between epochs to see if some layers are affected more then others. As it seems, even when setting
'backbone': dict(lr_mult=0.0)
, the backbone still gets trained. Also, it does not seem to matter at all how I set the parameter, the curve always look about the same. I know there is more going on and setting lr_mult=1 wil not result in a weight differences thats 10 times stronger than if I set it to 0.1, but it should certainly freeze the weights of the backbone if set to 0.0, right? Am I missing some part in my config?I first tried like this:
optimizer = dict( type='AdamW', lr=0.2, weight_decay=0.0001, paramwise_cfg=dict( custom_keys=dict( backbone=dict(lr_mult=0.0), sampling_offsets=dict(lr_mult=0.1), reference_points=dict(lr_mult=0.1))))
Which results in these differences for the layer weights after training for one epoch: (x are Layers with names from state-dict, y is the euclidian norm between the pretrained weights and weights after one epoch per layer,each normalized per layer and divided by layer size to make layers comparable.) The blue part should be constant zero in my understanding. After more epochs, all backbone layers will rise, not only the last ones, but I'm still training to get an image for that.
Reproduction
python tools/train.py /home/fuerste/thesis_root/model_compare/models/ddetr/various_tests/ddetr_animal_8_640_backbone_1_e1_warmup.py
Config:
Minor side-question: Why is the path to or the parameter "load_from" not in the config that gets saved on starting the run?