Where does your "cosine" learning rate policy come from?

ArchNew commented 3 years ago

I've checked the paper "SGDR: Stochastic Gradient Descent With Warm Restart", and also the slowfast codes. Your "cosine" learning policy is different from both of them in a non-trivial way.

Taking slowfast as an example, https://github.com/facebookresearch/SlowFast/blob/10651bcb6c20fca71ebdff803f803203d251f95c/slowfast/utils/lr_policy.py#L43-L46 But yours, https://github.com/lingtengqiu/OPEC-Net/blob/ae9c912370c3baa5df62fceeefd42b80d647ec2d/engineer/utils/lr_step_method.py#L42-45

It's about end_lr * (math.cos(math.pi * cur_epoch / cfg.nEpochs) + 1.0) * 0.5 difference between yours and the slowfast's. Supposing end_lr is the learning rate your code end with, saying 1e-5.

And also, your cosine policy is combined with Adam optimizer, not SGD optimizer as the slowfast and the SGDR paper used.

Is there any paper related to your learning schedule? Thanks!

lingtengqiu commented 3 years ago

Hi, I suggest you read the paper "YOLOV5". Although the paper aims at object detection, there exist many tricks which could be suitable for other tasks including cosine policy! Good luck

ArchNew commented 3 years ago

Hi, I suggest you read the paper "YOLOV5". Although the paper aims at object detection, there exist many tricks which could be suitable for other tasks including cosine policy! Good luck

Thanks for your answer. I stopped reading yolo series at yolo v3. It seems I made a mistake. Yolo is indeed a treasure of tricks.

lingtengqiu / OPEC-Net

Where does your "cosine" learning rate policy come from? #19