Closed fangxu622 closed 3 years ago
Hello, we set the warm-up iteration number here, which means 20 epochs and each epoch has 1252 iterations.
lr_config = dict(
policy='CosineAnnealing',
by_epoch=False,
min_lr_ratio=1e-2,
warmup='linear',
warmup_ratio=1e-3,
warmup_iters=20 * 1252,
warmup_by_epoch=False)
You can modify it to
lr_config = dict(
policy='CosineAnnealing',
by_epoch=False,
min_lr_ratio=1e-2,
warmup='linear',
warmup_ratio=1e-3,
warmup_iters=20,
warmup_by_epoch=True)
or
lr_config = dict(
policy='CosineAnnealing',
by_epoch=False,
min_lr_ratio=1e-2,
warmup='linear',
warmup_ratio=1e-3,
warmup_iters=20 * 759,
warmup_by_epoch=False)
Refers to mmcv docs
Yes. But loss is wired. When set warmup_iters=5,(10, 20 , ) warmup_by_epoch=True , the loss declined. But rised after epoch 5 (10,20).
But the base architecture of swintransformer is OK , their settings is same except for architecture setting (base , small).
Maybe you can try to modify the learning rate and learning rate scheduler to fit your dataset.
Cloud please tell me how to calculate the lr according to decay and warmup lr strategy. The two strategies make me confused.
for example 1:
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys=dict({
'.absolute_pos_embed': dict(decay_mult=0.0),
'.relative_position_bias_table': dict(decay_mult=0.0)
}))
optimizer = dict(
type='AdamW',
lr=0.0011484375000000002,
weight_decay=0.05,
eps=1e-08,
betas=(0.9, 0.999),
paramwise_cfg=dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys=dict({
'.absolute_pos_embed': dict(decay_mult=0.0),
'.relative_position_bias_table': dict(decay_mult=0.0)
})))
optimizer_config = dict(grad_clip=dict(max_norm=5.0))
lr_config = dict(
policy='CosineAnnealing',
by_epoch=False,
min_lr_ratio=0.01,
warmup='linear',
warmup_ratio=0.001,
warmup_iters=8800,
warmup_by_epoch=False)
runner = dict(type='EpochBasedRunner', max_epochs=400)
I can found the register function on optimize and lr_update file on mmcv . but I don't know how to execute the warmup and lr_update strategy on the whole pipeline. Cloud you give me the formula that calculates the lr for reference. I didn't calculate it correctly according to the source code.
The wired thing is that same strategy on swintransformer base architecture is ok but no small. the loss declines fast on base architecture.
The learning rate scheduler implementation of CosineAnnealing
is in mmcv. As for the detail formula, you can refer to the PyTorch docs
Has this question been solved?
Has this question been solved?
Yeah! Solved! Thank you for your reminder
the config file: