Small learning rate value

hustvl / YOLOS

[NeurIPS 2021] You Only Look at One Sequence

https://arxiv.org/abs/2106.00666

MIT License

827 stars 118 forks source link

Small learning rate value #8

Closed davidnvq closed 3 years ago

davidnvq commented 3 years ago

❔Question

Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5 as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.

Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?

Many thanks.

Yuxin-CV commented 3 years ago

Hi @davidnvq, thanks for your interest in YOLOS.

We haven't got many chances to try. We found YOLOS with 5 * 10e-5 & 10 * 10e-5 lr can converge, but gives less competitive results.

For the lr scaling and large scale training, please refer to https://github.com/facebookresearch/detr/issues/48#issuecomment-638689380 and https://github.com/hustvl/QueryInst/issues/12#issuecomment-858228649.

davidnvq commented 3 years ago

Thanks for your feedback. Let me close the issue.