Open zhangchbin opened 1 month ago
Thanks for your question. We choose batch_size=10 due to the lack of GPU. We have explored the combinations of learning_rate
and batch_size
and found that they would have a slight impact on the final performance. As you said, the existing DETR methods use the common setting of batch size 16. However, they do not follow a rule to keep the same settings of learning_rate under the same batch_size. For example:
lr=2e-4
for bs=16
lr=1e-4
for bs=16
lr=1e-4
for bs=10
(equivalent to lr=1.6e-4
for bs=16
, according to the Linear Scaling Rule
)Typically, a larger learning rate within a certain range will slightly improve the final performance. Our setting is a compromise choice between choice 1 and choice 2. If you train our models with the choice 1, you should get better performance.
For the 2-nd question, the reason why H-DETR damages the performance of DINO may be that it deals with matching queries and hybrid queries using the same process but different matching strategies, which leads to conflicts. In our paper, we think that our position relation is helpful to overcome the conflict and increase performance when integrated with hybrid matching. This is why we use our it in our framework. The experiment results also verify its effectiveness.
Thanks for your response. And I find that the weight decay of some special parameters is adjusted, as shown in: https://github.com/xiuqhou/Relation-DETR/blob/main/optimizer/param_dict.py#L81 I wonder where is the schedule from? Thanks.
great work! Thank you very much! Now I am running Relation-DETR with batchsize=16 while keeping all other hyper-parameters. I will let you know if I have the final results.
Hi @xiuqhou , I have trained the relation-detr with a batch size of 12 (because I only have 4x3090), and it shows the last 51.7 AP. Thank you very much!
Hi @zhangchbin , I'm glad to hear that you have achieved AP similar to ours. Thank you for your support to this work!
Question
I noticed that you use batch_size=10 in your paper, while the common setting of batch size is 16. So did you explore this point? By the way, as mentioned in your paper, H-DETR does harm the performance of DINO, but why do you still choose to combine H-DETR in the framework?
Additional
No response