mmdet - Githubissues

Chop1 commented 1 year ago

Hello,

will this be integrated in the official mmdet library? Thanks

jshilong commented 1 year ago

Due to my involvement in other heavy-load projects, it may require assistance from the community to integrate DDQ into mmdet

ironmanfcf commented 1 year ago

Hello, I have attempted to integrate ddqdetr into mmdet3.0.0 and it can run successfully. However, in the subsequent improvement process, due to some sudden interruptions during the training process, the error code is as follows. I did not find the reason for this and would like to ask if anyone else has experienced similar situations. 1a78a650bee3f6f9662ce70101ed4ea b2d609611273b7bec021a3f76577280

jshilong commented 1 year ago

These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications.

ironmanfcf commented 1 year ago

    Yes, only after some splitting and referencing modifications were made to the code, ddq detr can run smoothly in this repository. Meanwhile, I am  also using mmcv2.0.0rc4 but  under pytorch1.12.1+cuda113. 

In addition, I reduced the number of queries and applied ddq-detr to my own tasks which has higher density instances. I suspect it may have an impact on the subsequent one-to-many allocation of auxiliary losses calculation. I think this may be the reason for the training interruption

                    ***@***.***

---- Replied Message ----

     From 

        Shilong ***@***.***>

     Date 

    6/22/2023 10:45

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [jshilong/DDQ] mmdet (Issue #12)

These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

jshilong commented 1 year ago

Yes, only after some splitting and referencing modifications were made to the code, ddq detr can run smoothly in this repository. Meanwhile, I am also using mmcv2.0.0rc4 but under pytorch1.12.1+cuda113. In addition, I reduced the number of queries and applied ddq-detr to my own tasks which has higher density instances. I suspect it may have an impact on the subsequent one-to-many allocation of auxiliary losses calculation. I think this may be the reason for the training interruption @. … ---- Replied Message ---- From Shilong @.> Date 6/22/2023 10:45 To @.> Cc @.> , @.> Subject Re: [jshilong/DDQ] mmdet (Issue #12) These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

Maybe you can directly remove the auxiliary loss of the decoder to verify that the code can run smoothly,

jshilong commented 1 year ago

If there are an extremal large number of instances in an image， please make sure the number of queries is enough to match the gt, due to the aux branch using 4 positive samples for each gt, there should be at least max_gt*4 queries.

ironmanfcf commented 1 year ago

Hello, during the experiment, I found that compared to DINO, DDQDETR has significantly increased training time and memory usage. By looking at the code and experimenting, I found that the main improvements that may cause the above problems should include: auxiliary detection head branches and auxiliary loss calculation, as well as nms operations added in the query selection stage and decoder stage.I would like to ask: 1. Are there any other factors that I haven't noticed that can increase graphics memory and cause training to slow down?2. May I ask which part or parts of the training time computation and  memory are mainly from, and what is the approximate relationship between the burden and the increased accuracy?In the paper, I noticed that you discussed the impact of changing the number of qureys on video memory usage in appendix, but I did not notice the impact of other improvements on the training process. Due to hardware limitations, these issues are very helpful for my subsequent experimental setup.Message ID: ***@***.***>

jshilong commented 1 year ago

The longer training time and higher memory usage are due to the calculation of 1.5*900 = 1350 auxiliary queries and their corresponding label assignment process. If this is causing a heavy burden, you may consider removing the auxiliary branch in your experiment

Johnson-Wang commented 1 year ago

Both the training time and memory only increase by around 5%-10% compared to DINO. I have also encountered a heavy slowdown with around 2x as much time per iteration as DINO, which was later found to be caused by the low version of scipy. In some versions of scipy, e.g. (scipy==1.5), the maximum_bipartite_matching function would take an unexpectedly too long time. You may try to install scipy>=1.7.3 if your problem is similar.

ironmanfcf commented 1 year ago

        Thank you very much. I will try and reply to you later

                    ***@***.***

---- Replied Message ----

     From 

        Wang ***@***.***>

     Date 

    9/7/2023 21:01

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [jshilong/DDQ] mmdet (Issue #12)

Both the training time and memory only increase by around 5%-10% compared to DINO. I have also encountered a heavy slowdown with around 2x as much time per iteration as DINO, which was later found to be caused by the low version of scipy. In some versions of scipy, e.g. (scipy==1.5), the maximum_bipartite_matching function would take an unexpectedly too long time. You may try to install scipy>=1.7.3 if your problem is similar.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

jshilong commented 5 days ago

https://github.com/open-mmlab/mmdetection/tree/main/configs/ddq It has been merged into official mmdetection repo

jshilong / DDQ

mmdet #12