About batch size and lr

helq2612 commented 2 years ago

Hi ,

I have one question about the batch size and learning rate used in Anchor Detr. From the paper, the batch size = 1x8=8, and the lr =1e-4 (for backbone it is 1e-5). But comparing to Detr or Conditional Detr, they are using batch size = 2 x 8 = 16, and with the same learning rate settings as yours.

From Detr's discussion, https://github.com/facebookresearch/detr/issues/48#issuecomment-638689380, that author provides two options:

use the same lr configuration, “by using Adam it was ok to use the same default values for all configurations (even if using 64 GPUs)”
if increase the batch twice, then its better to increase lr as sqrt(2). I guess if reduce the batch by half, then we should reduce the lr by sqrt(2).

Have you tried with different learning rate settings?

Thank you!

tangjiuqi097 commented 2 years ago

@helq2612 Hi, we have not paid much attention to tuning these hyper-parameters and have not tried different learning rate settings.

helq2612 commented 2 years ago

Thank you!

github-actions[bot] commented 2 years ago

This issue is not active for a long time and it will be closed in 5 days. Feel free to re-open it if you have further concerns.

megvii-research / AnchorDETR

About batch size and lr #21