Some questions about the DDQ-DETR code

jshilong / DDQ

Dense Distinct Query for End-to-End Object Detection (CVPR2023)

Apache License 2.0

244 stars 6 forks source link

Some questions about the DDQ-DETR code #14

Closed Longzhong-Lin closed 5 days ago

Longzhong-Lin commented 1 year ago

Hi, thanks for opensource such a wonderful work! However, I get confused when I read the DDQ-DETR code in file ddq_detr.py.

1, as far as I can understand, the 2d matrix distinct_query_mask whose both rows and columns corrosponding to the distinct queries will be set False. In line 46 and line 584, you take the first row to get indices of distinct queries, but I don't think this operation works: if the first query is selected for distinct queies, the first line in distinct_query_mask will be all False.

2, I see both the paper and code show that DQS is performed before each decoder layer. However, if 2 distinct queries unfortunately produce similar predictions, the classification loss will still be disturbed because you use predictions other than queries at the certain layer to match gt. I wonder have you tried to do DQS after each decoder layer?

jshilong commented 1 year ago

Thank you for your interest. For question2, I guess your confusion can be solved with 583 l_id -1 , we do nms on the predictions of the last stage and cache it for this stage, which is different with it in one-stage model.

jshilong commented 1 year ago

Regarding question 1, you are correct that the implementation may result in a logic error when the first query is selected as a distinct query. Keeping the real_keep_index　to cache may be a better approach.

However, this scenario may be relatively rare and may not have a significant impact on performance. I have tried another implementation where non-distinct queries are removed from the decoder directly, but it resulted in slightly lower performance than the current implementation.

Thank you for bringing this to our attention!

Longzhong-Lin commented 1 year ago

Regarding question 1, you are correct that the implementation may result in a logic error when the first query is selected as a distinct query. Keeping the real_keep_index　to cache may be a better approach.

However, this scenario may be relatively rare and may not have a significant impact on performance. I have tried another implementation where non-distinct queries are removed from the decoder directly, but it resulted in slightly lower performance than the current implementation.

Thank you for bringing this to our attention!

It's very kind of you to reply so quickly! I've seen that you do DQS before each decoder layer in line 583. What I'm curious of is why you choose to do DQS on the predictions from last layer in DETR? Have you try to do DQS on the predictions of the current layer in DETR, just like what you did in FCN & R-CNN?

jshilong commented 1 year ago

During camera-ready, we added results of DDQ-DETR that were not included in our submission. Therefore, I did not have enough time to try other implementations. Performing DQS on the predictions of the current layer in DETR is also reasonable. I performed a similar ablation study on Sparse R-CNN (each decoder layer) in a very initial version (perhaps before ECCV2022) and obtained comparable results for these two implementations. You can try it on DDQ-DETR

Longzhong-Lin commented 1 year ago

During camera-ready, we added results of DDQ-DETR that were not included in our submission. Therefore, I did not have enough time to try other implementations. Performing DQS on the predictions of the current layer in DETR is also reasonable. I performed a similar ablation study on Sparse R-CNN (each decoder layer) in a very initial version (perhaps before ECCV2022) and obtained comparable results for these two implementations. You can try it on DDQ-DETR

Got it! Thank you so much for such quick reply.