Is there any influence about training batch_size?

zhangchbin commented 1 month ago

Question

I noticed that you use batch_size=10 in your paper, while the common setting of batch size is 16. So did you explore this point? By the way, as mentioned in your paper, H-DETR does harm the performance of DINO, but why do you still choose to combine H-DETR in the framework?

Additional

No response

xiuqhou commented 1 month ago

Thanks for your question. We choose batch_size=10 due to the lack of GPU. We have explored the combinations of learning_rate and batch_size and found that they would have a slight impact on the final performance. As you said, the existing DETR methods use the common setting of batch size 16. However, they do not follow a rule to keep the same settings of learning_rate under the same batch_size. For example:

Deformable-DETR, Co-DETR, DINO in detrex: lr=2e-4 for bs=16
DINO, H-Def-DETR: lr=1e-4 for bs=16
Salience-DETR/Relation-DETR: lr=1e-4 for bs=10 (equivalent to lr=1.6e-4 for bs=16, according to the Linear Scaling Rule)

Typically, a larger learning rate within a certain range will slightly improve the final performance. Our setting is a compromise choice between choice 1 and choice 2. If you train our models with the choice 1, you should get better performance.

For the 2-nd question, the reason why H-DETR damages the performance of DINO may be that it deals with matching queries and hybrid queries using the same process but different matching strategies, which leads to conflicts. In our paper, we think that our position relation is helpful to overcome the conflict and increase performance when integrated with hybrid matching. This is why we use our it in our framework. The experiment results also verify its effectiveness.

zhangchbin commented 1 month ago

Thanks for your response. And I find that the weight decay of some special parameters is adjusted, as shown in: https://github.com/xiuqhou/Relation-DETR/blob/main/optimizer/param_dict.py#L81 I wonder where is the schedule from? Thanks.

xiuqhou commented 1 month ago

The param_dict is modified from that in DINO project, see here

zhangchbin commented 1 month ago

great work! Thank you very much! Now I am running Relation-DETR with batchsize=16 while keeping all other hyper-parameters. I will let you know if I have the final results.

zhangchbin commented 3 weeks ago

Hi @xiuqhou , I have trained the relation-detr with a batch size of 12 (because I only have 4x3090), and it shows the last 51.7 AP. Thank you very much!

xiuqhou commented 3 weeks ago

Hi @zhangchbin , I'm glad to hear that you have achieved AP similar to ours. Thank you for your support to this work!

xiuqhou / Relation-DETR

Is there any influence about training batch_size? #9

Question

Additional