Closed vadimkantorov closed 2 years ago
@vadimkantorov Hi, we have not compared with deformable attention because the deformable attention is much different from others, i.e., it is not in the query, key, value manner. The deformable attention is similar to the DCNv2 instead and will lead to random access of memory which may not be friendly for the hardware.
The max training memory is Deformable DETR 4.1G vs. Anchor DETR 4.3G
. In terms of the attention mechanism, the deformable attention can save more memory, but it cannot well deal with a single-level feature so that it has a similar memory cost to ours when using the multiple-level features for comparable accuracy.
Thank you for these insights!
Unfortunately Table 6 does not compare to DeformableDETR. If you have compared, Ddes AnchorDETR consume less GPU memory than DeformableDETR? If yes, approximately by how much?
Thank you!