shariqfarooq123 / AdaBins

Official implementation of Adabins: Depth Estimation using adaptive bins
GNU General Public License v3.0
725 stars 156 forks source link

Bug in the scheduler #44

Open shreyaskamathkm opened 2 years ago

shreyaskamathkm commented 2 years ago

Hi,

Thank you so much for publishing the code. This work is excellent and has achieved state-of-the-art performance. However, when I was trying to retrain the model with my training code built with pytorch lightning, I was unable to reach the results obtained in the paper.

I think I have found a bug in the code, which may be why I could not achieve the desired result. When the args.same_lr flag is set to false, the encoder and decoder are supposed to have different learning rates. However, the learning rate is still constant.

The OneCycleLR used in this line expects a list of lrs to use different learning rates for encoder and decoder. But as an integer is used, the lr is the same for both encoder and decoder even when args.same_lr is set to false.

When I print out the scheduler.optimizer, I receive the following.

AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

When I replace the section with the following:

    lrs = [l['lr'] for l in optimizer.param_groups]
    ###################################### Scheduler ###############################################
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer, lrs, epochs=epochs, steps_per_epoch=len(train_loader),
                                              cycle_momentum=True,
                                              base_momentum=0.85, max_momentum=0.95, last_epoch=args.last_epoch,
                                              div_factor=args.div_factor,
                                              final_div_factor=args.final_div_factor)

Notice that the output of scheduler.optimizer changes to the following


AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-06
    lr: 1.4280000000000006e-06
    max_lr: 3.57e-05
    max_momentum: 0.95
    min_lr: 1.428e-08
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    base_momentum: 0.85
    betas: (0.95, 0.999)
    eps: 1e-08
    initial_lr: 1.428e-05
    lr: 1.4279999999999978e-05
    max_lr: 0.000357
    max_momentum: 0.95
    min_lr: 1.428e-07
    weight_decay: 0.1
)

Would you please let me know if this behavior is intentional?

Thank you, Best, SK

shariqfarooq123 commented 2 years ago

Hi,

Thank you for pointing this out. This might actually be a bug (also present when AdaBins was trained). OnecycleLR Scheduler should get the lrs for all param groups as it overwrites the originals!

However, this also means that original AdaBins was actually trained on settings intended by "same_lr".