zhangyp15 / MonoFlex

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21
MIT License
216 stars 40 forks source link

the definition of 2d center #2

Open patrick-llgc opened 3 years ago

patrick-llgc commented 3 years ago

Hi @zhangyp15 , I have a question regarding the definition of 2D center.

For truncated objects, how is the 2D center $x_b$ obtained? Is it the center of the 2D tight bbox (min bbox around the segmentation mask) or the 2D bbox around the partial projected 3D bbox that is inside the image?

From Fig. 4(c), it looks like the latter, as the 2D bbox is not tightly surrounding the object. It would be great if you could confirm.

lfydegithub commented 3 years ago

Hi @zhangyp15 , I have a question regarding the definition of 2D center.

For truncated objects, how is the 2D center $x_b$ obtinaed? Is it the center of the 2D tight bbox (min bbox around the segmentation mask) or the 2D bbox around the partial projected 3D bbox that is inside the image?

From Fig. 4(c), it looks like the latter, as the 2D bbox is not tightly surrounding the object. It would be great if you could confirm.

good question ~, I also wanna the answers. BTW, image

if xr is xi, the left or right was zero?

zhangyp15 commented 3 years ago

For truncated objects, x_b is the center of the partial bbox inside the image, following the annotations of KITTI.

Yes, the left or right can be zero for objects on the boundary.

patrick-llgc commented 3 years ago

@zhangyp15 thanks for your reply! How was the partial box obtained? Is it the annotated 2D bbox (COCO-style) or the generated 2D bbox from the reprojection of the 3D bbox?

VincentGu11 commented 3 years ago

For truncated objects, x_b is the center of the partial bbox inside the image, following the annotations of KITTI.

Yes, the left or right can be zero for objects on the boundary.

Thanks for the wonderfull work. I have a simple question that in the paper, you said that for 2D bbox regression is following FCOS, do you have centerness layer like FCOS? And do you regress a heatmap for obtaining x_b? Because in 3.3 the center of bbox is defined as x_c. How do you get the x_b? Thanks a lot!

And btw, for truncated objects, paper said that the x_r is the x_i, you didn't use the x_b for trancated object right?

lfydegithub commented 3 years ago

For truncated objects, x_b is the center of the partial bbox inside the image, following the annotations of KITTI. Yes, the left or right can be zero for objects on the boundary.

Thanks for the wonderfull work. I have a simple question that in the paper, you said that for 2D bbox regression is following FCOS, do you have centerness layer like FCOS? And do you regress a heatmap for obtaining x_b? Because in 3.3 the center of bbox is defined as x_c. How do you get the x_b? Thanks a lot!

And btw, for truncated objects, paper said that the x_r is the x_i, you didn't use the x_b for trancated object right?

_"And do you regress a heatmap for obtaining x_b? Because in 3.3 the center of bbox is defined as x_c. How do you get the xb?"

I also have the same question, I think there should be x_c when inside. As we see, the network do not regress x_b .

lfydegithub commented 3 years ago

@zhangyp15 thanks for your reply! How was the partial box obtained? Is it the annotated 2D bbox (COCO-style) or the generated 2D bbox from the reprojection of the 3D bbox?

Although I also want know the answer, the 2d bbox is useless in this paper. The depth from keypoint needs h_l, which is computed from keypoints while not 2d bbox.

VincentGu11 commented 3 years ago

@zhangyp15 thanks for your reply! How was the partial box obtained? Is it the annotated 2D bbox (COCO-style) or the generated 2D bbox from the reprojection of the 3D bbox?

Although I also want know the answer, the 2d bbox is useless in this paper. The depth from keypoint needs h_l, which is computed from keypoints while not 2d bbox.

+1, didn't see any usage of 2d bbox, remove it may have a better performance?

patrick-llgc commented 3 years ago

@lfydegithub @VincentGu11 center of 2d bbox is need to calculate the $x_I$. See Page 4, section "Outside bbox". That is the reason I wanted to know how the 2d bbox is generated. My guess is that it is generated by retrojecting the 8 vertices of the 3d bbox into the image and form a tightest bbox around the points and then cropping by the image boundary, but it would be really nice to have the author to confirm it.

cc @zhangyp15

zhangyp15 commented 3 years ago

Thanks for your discussion.

@VincentGu11 @lfydegithub Yes, the regression of 2D bounding boxes is not used for 3D object detection, but we think regressing 2D bounding boxes, as a simple task, can help the feature learning and accelerate convergence, which is validated with experiments in another paper ''Delving into Localization Errors for Monocular 3D Object Detection''.

@patrick-llgc The projected 2d boxes are used when all projected corners are inside, otherwise the partial boxes from the original KITTI annotation are used.

VincentGu11 commented 3 years ago

Thanks for your discussion.

@VincentGu11 @lfydegithub Yes, the regression of 2D bounding boxes is not used for 3D object detection, but we think regressing 2D bounding boxes, as a simple task, can help the feature learning and accelerate convergence, which is validated with experiments in another paper ''Delving into Localization Errors for Monocular 3D Object Detection''.

@patrick-llgc The projected 2d boxes are used when all projected corners are inside, otherwise the partial boxes from the original KITTI annotation are used.

Wow, thanks for telling! And when will you release your code? Really looking forward to it!