xiuqhou / Relation-DETR

[ECCV2024 Oral] Official implementation of the paper "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection"
Apache License 2.0
134 stars 11 forks source link

Inference speed #11

Open JohnMBrandt opened 3 months ago

JohnMBrandt commented 3 months ago

Question

Great work! Your research is very impressive and I've been using it for tiny object detection with great results -- there are many fewer false positives.

However, I'm struggling with inference time! Is it expected to be so slow? With a DINO DETR / Resnet backbone in MMDetection, I can run inference on a 512x512 image in 250 ms.

With the PyTorch implementation in this repo, I can run a Relation DETR / Resent backbone in 3.4 seconds. If I export it to ONNX, this only goes down to 2.5 seconds, so ~10x slower than DINO.

Additional

No response

xiuqhou commented 3 months ago

Hi, the inference speed should not be so slow. The only inference difference between the Relation-DETR and DINO is the position relation embedding. But it should introduce not much additional calculation cost.

This is my inference speed test results on one single 3090 GPU (My low version MMDetection does not support DINO, so I test Deformable-DETR. I also re-implement it following this repo to compare the difference between my implementation and MMDetection):

Model Implementation input shape FPS result
Deformable-DETR MMDetection 800x1333 7.5 图片
Deformable-DETR this repo 800x1333 7.98 图片
Relation-DETR this repo 800x1333 7.86 图片

This is the script to test their speeds:

# test Deformable-DETR in MMDetection
CUDA_VISIBLE_DEVICES=4 python tools/analysis_tools/benchmark_single.py configs/deformable_detr/deformable_detr_r50_16x2_50e_coco.py checkpoint.pth

# test Deformable-DETR, I implement Deformable-DETR based on this repo
CUDA_VISIBLE_DEVICES=4 python tools/benchmark_model.py --model-config configs/deformable_detr/def_detr_resnet50_800_1333.py --repeat 2000

# test Relation-DETR in this repo
CUDA_VISIBLE_DEVICES=4 python tools/benchmark_model.py --model-config configs/relation_detr/relation_detr_resnet50_800_1333.py --repeat 2000

Considering randomness, in my test, I found there is not much difference between the inference time of Relation-DETR and Deformable-DETR (whether using MMDetection or implementation following this repo).

Could you please provide more details how you test their speeds? In addition, I suggest testing the inference speed of each component of Relation-DETR separately to figure out which one causes the slow speed.