large-patch finetune lr settings?

linzhi-li commented 2 years ago

Do you change the learning rate (fixed learning rate or cosine annealing schedule?) when finetuning the model with patchs of larger sizes? Thank you!

zudi-lin commented 2 years ago

I have attached the finetuning command below. We still use cosine annealing, but the learning rate scheduler is restarted with a smaller initial learning rate.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -u -m torch.distributed.run --nproc_per_node=4 \
--master_port=9997 main.py --distributed --config-base configs/RCAN/RCAN_Improved.yaml \
--config-file configs/RCAN/RCAN_x2_LP.yaml SYSTEM.NUM_GPU 4 SYSTEM.NUM_CPU 16 \
SOLVER.ITERATION_RESTART True SOLVER.ITERATION_TOTAL 40000 SOLVER.BASE_LR 8e-4 \
MODEL.PRE_TRAIN outputs/RCAN_x2_IT_baseline/model/model_latest.pth.tar

Remember to change MODEL.PRE_TRAIN to your own model.

linzhi-li commented 2 years ago

Thank you for your timely reply. I noticed that the finetune total iteration number is 1/4 of the basic iteration numbers, so is the shrinked initial learning rate. Is the change of the base learning rate raleted to that of the total iteration number?

zudi-lin commented 2 years ago

Hi @linzhi-li, yes we only finetune for 40k iterations to save training time, as described in Additive improvement (page 5 of https://arxiv.org/pdf/2201.11279.pdf). The initial learning rate should be smaller than training from scratch (0.0032), and the 8e-4 value is decided empirically.

zudi-lin / rcan-it

large-patch finetune lr settings? #9