GPU memory consumption of AnchorDETR compared to DeformableDETR

vadimkantorov commented 2 years ago

Unfortunately Table 6 does not compare to DeformableDETR. If you have compared, Ddes AnchorDETR consume less GPU memory than DeformableDETR? If yes, approximately by how much?

Thank you!

tangjiuqi097 commented 2 years ago

@vadimkantorov Hi, we have not compared with deformable attention because the deformable attention is much different from others, i.e., it is not in the query, key, value manner. The deformable attention is similar to the DCNv2 instead and will lead to random access of memory which may not be friendly for the hardware.

The max training memory is Deformable DETR 4.1G vs. Anchor DETR 4.3G. In terms of the attention mechanism, the deformable attention can save more memory, but it cannot well deal with a single-level feature so that it has a similar memory cost to ours when using the multiple-level features for comparable accuracy.

vadimkantorov commented 2 years ago

Thank you for these insights!

megvii-research / AnchorDETR

GPU memory consumption of AnchorDETR compared to DeformableDETR #14