A question for 3dAP and latency in PGD

open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.

https://mmdetection3d.readthedocs.io/en/latest/

Apache License 2.0

5.25k stars 1.54k forks source link

A question for 3dAP and latency in PGD #2433

Open benz725 opened 1 year ago

benz725 commented 1 year ago

I have a question for PGD (Probabilistic and Geometric Depth: Detecting Objects in Perspective).
In paper you mentioned that the inference on 1080TI, the latency is 28ms (or 36 Hz), is quicker than SMOKE and MonoFlex, and AP (I Guess is AP11) in KITTI is 24.35 18.34 16.90 (easy, moderate, hard). So how to get that result ? (A result run in latency 28ms and AP at 24.35 18.34 16.90 )

Tai-Wang commented 1 year ago

The AP is actually "AP11" following the convention of previous methods at that time. Our released models can achieve this performance as the benchmark shows. As for the latency, we used some engineering tricks to improve the released version, such as keeping the frozen_stage=1 in the backbone and using only 3-level FPN to reduce the computational overhead during deployment. Some of these tricks may be considered to be released later.

benz725 commented 1 year ago

The AP is actually "AP11" following the convention of previous methods at that time. Our released models can achieve this performance as the benchmark shows. As for the latency, we used some engineering tricks to improve the released version, such as keeping the frozen_stage=1 in the backbone and using only 3-level FPN to reduce the computational overhead during deployment. Some of these tricks may be considered to be released later.

Thank you for explanation. Another question, maybe not for this repository, that is: In 3d detection field, the two types of anchor-free method, the keypoint based method (SMOKE, MonoFlex, MonoPair, RTM3D) or the segmentation based method (PGD, FCOS3d), which one do you think is better ? (For precision and latency) If not convenient for you, you may not answer me. Thank you.

Tai-Wang commented 1 year ago

The classification of your listed papers is a little strange and inaccurate (PGD and FCOS3D are not segmentation-based methods). In general, you can consider SMOKE and FCOS3D as the basic data-driven works for monocular 3D detection, and others typically use monocular geometry (key points, pair-wise geometric relationship, perspective geometry constraints, etc.) to enhance the baseline frameworks.

benz725 commented 1 year ago

@Tai-Wang Because in the original paper FCOS, the author used the ideas from segmentation and said that can directly use the backbone from segmentation to do detection. Every point in the ground truth object can be seen as a positive sample . So I called them segmentation based method. On the other hand, RTM3D, SMOKE, CenterNet and MonoFlex, they only have one positive point (anchor point) for one object, and they don't need NMS. I think the former and latter are two different types of anchor-free method.