Bug in the scheduler - Githubissues

Hi,

Thank you so much for publishing the code. This work is excellent and has achieved state-of-the-art performance. However, when I was trying to retrain the model with my training code built with pytorch lightning, I was unable to reach the results obtained in the paper.

I think I have found a bug in the code, which may be why I could not achieve the desired result. When the args.same_lr flag is set to false, the encoder and decoder are supposed to have different learning rates. However, the learning rate is still constant.

The OneCycleLR used in this line expects a list of lrs to use different learning rates for encoder and decoder. But as an integer is used, the lr is the same for both encoder and decoder even when args.same_lr is set to false.

When I print out the scheduler.optimizer, I receive the following.

AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

When I replace the section with the following:

    lrs = [l['lr'] for l in optimizer.param_groups]
    ###################################### Scheduler ###############################################
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer, lrs, epochs=epochs, steps_per_epoch=len(train_loader),
                                              cycle_momentum=True,
                                              base_momentum=0.85, max_momentum=0.95, last_epoch=args.last_epoch,
                                              div_factor=args.div_factor,
                                              final_div_factor=args.final_div_factor)

Notice that the output of scheduler.optimizer changes to the following


AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-06
    lr: 1.4280000000000006e-06
    max_lr: 3.57e-05
    max_momentum: 0.95
    min_lr: 1.428e-08
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

Would you please let me know if this behavior is intentional?

Thank you, Best, SK

shariqfarooq123 / AdaBins

Bug in the scheduler #44