liuyuan-pal / Gen6D

[ECCV2022] Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images
GNU General Public License v3.0
586 stars 74 forks source link

Question about the poor performance on custom data #29

Closed Uio96 closed 1 year ago

Uio96 commented 1 year ago

Thank you so much for this great work. I also tried configuring Gen6D and successfully applied it to the mouse data provided.

However, when I tried to apply Gen6D to my collected data, the results were not satisfying as the detector could not work properly.

There are some previous trials in https://github.com/liuyuan-pal/Gen6D/issues/4, https://github.com/liuyuan-pal/Gen6D/issues/11 and https://github.com/liuyuan-pal/Gen6D/issues/24. I noticed that you mentioned the issue of z flip or reference target being too small there, so I was careful with those factors. I also tried to put an ArUco tag as the background. I once thought my target object might be too hard, so I changed to a mouse like your demo one. But Gen6D still did not work on my own data.

A sample is like this 0 while my segmentation and labeling should be good

Do you have any thoughts on the reasons for the failure cases? I wonder if I have to print out another ArUco tag (denser than the one I use right now) similar to the one used in your demo mouse video or https://github.com/liuyuan-pal/Gen6D/issues/24#issuecomment-1280417908. Have you tried capturing the registration video without the ArUco tag board?

Thanks a lot.

EternalGoldenBraid commented 1 year ago

Have I understood correctly that the pointcloud reconstruction is used to label a 3D bounding box for defining the object center for each reference frame viewpoint? The 3D center is then projected to a 2D center for reference frames?

liuyuan-pal commented 1 year ago

Hi, for your problem, you may resize the query image a little bit smaller so that the detector is able to detect the region correctly because the detection seems to be too incorrect. Another problem is that you may still crop the mouse tighter so that the mouse in the reference image would look larger.

liuyuan-pal commented 1 year ago

Have I understood correctly that the pointcloud reconstruction is used to label a 3D bounding box for defining the object center for each reference frame viewpoint? The 3D center is then projected to a 2D center for reference frames?

Yes, this is almost correct! I compute a 3D bounding (inscribed) sphere (instead of a 3D bounding box) for the object and the reference images will include the projection of the whole sphere.

Uio96 commented 1 year ago

Hi, for your problem, you may resize the query image a little bit smaller so that the detector is able to detect the region correctly because the detection seems to be too incorrect. Another problem is that you may still crop the mouse tighter so that the mouse in the reference image would look larger.

Thank you so much for the reply. Your suggestion did help me with my mouse case.

Uio96 commented 1 year ago

I have a follow-up question. I tested on more cases and found that the detection result was very sensitive to the size of the target. For example, the following two inputs (both of them have been resized to a height of 640) may look similar as regards their target sizes, but the performance was quite different.

1) sample 1 0

2) sample 2 0 (1)

I realized you did mention that

The detector and viewpoint selector will be used for the initialization of the poses, which assume the object in the query image looks relatively similar to the reference images in terms of scale and in-plane orientation (the given z-axis is the up direction). https://github.com/liuyuan-pal/Gen6D/issues/4#issuecomment-1186131326

However, some of my registration images are quite close to the target (I would say it is more similar to sample 1)

So my question is, when is Gen6D more guaranteed to succeed?

liuyuan-pal commented 1 year ago

Hi, to guarantee the success of the detector, we require the scale difference between the query image and the reference image is not too large. In the implementation, the reference image has a size of 128, so the detected bounding box in the query image is supposed to be 128/2=64 to 128*2=256. Scale differences larger than 2.0 or 0.5 would decrease the success rate. The reason is that we use convolutions on different scales to detect the object. The predefined scale range is 0.5 to 2.0.

liuyuan-pal commented 1 year ago

BTW, we will always crop and resize the reference image to 128*128 according to the object_point_cloud.ply, which is not affected by your registered image size.

Uio96 commented 1 year ago

Thank you so much. The information is really helpful.