[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Thanks so much for your great work!
I am confused about how you adjust the lr and weight_decay. In pretrain/utils/lr_control.py/lr_wd_annealing, I think lr and wd are only acquired and not changed.
Hi,
Thanks so much for your great work! I am confused about how you adjust the lr and weight_decay. In pretrain/utils/lr_control.py/lr_wd_annealing, I think lr and wd are only acquired and not changed.
Thank you!