j96w / DenseFusion

"DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion" code repository
https://sites.google.com/view/densefusion
MIT License
1.07k stars 300 forks source link

Incorrect visualization result on YCB dataset #27

Closed Fiona730 closed 5 years ago

Fiona730 commented 5 years ago

Hello, thanks for sharing your code!! I tried to visualize the output on YCB test set, but the result doesn't not align with that in Fig 4. in your paper. Here is one of my visualization results. The up left image is generated using the ground truth R and T in xxx-mata.mat, and it is correct. The down left and down right images are generated by simply replacing the R and T by those in mat files in result_wo_refine_dir and result_refine_dir. Pose estimation in these two images seems to go wrong. 420_gt The visualization process is to transfer points in points.xyz with R and T and then scale and transform those points to fit into the object's tight bounding box. I used trained checkpoints provided by you.

May I ask your suggestion on what the problem might be? Thanks for your time.

j96w commented 5 years ago

Hi, this problem is mainly because we are using one network to tackle all objects which are trained with different loss functions (ADD for non-symmetrical objects, ADD-S for symmetries like the bowl in your case), which might lead to some unstable performance on one or two specific objects but the overall mean score among all the objects is stable. To tackle this issue, like how we get the performance in the paper Fig.4, my suggestion would be to separate the training of symmetry and non-symmetry objects so that the performance would not be affected by the different scale of the loss.

Fiona730 commented 5 years ago

Thanks for your quick reply! However, the trained model I downloaded didn't perform well in many visualization cases. I am not able to evaluate the result with YCB toolbox (because Matlab is unavailable on the server), but the wrong estimation rate in test frames is roughly 40%. So I suppose symmetry is not the main problem. Some other visualization results are provided below. Do you have any other idea on this? 2710_gt 630_gt

j96w commented 5 years ago

Hi @Fiona730, I'm quite sure that there must be some bugs in your visualization code. Here is the result I re-run the released checkpoints on all the testing videos of YCB dataset. As you can see, there are no such errors like yours. And there is no need for you to transform those points to fit into the object's tight bounding box. Without fitting, it's already good enough.

aaa 1 aaa 2

Although I do not know your code, there are several of my guessings: (1) Please make sure you acquire the correct pose of the object you are visualizting. The evaluated pose saved in each frame XXXX.mat is not ranked according to the index order. It is followed by the detection result released by PoseCNN (See YCB_Video_toolbox/results_PoseCNN_RSS2018/xxxxxx.mat the second row of the tag rois). I follow their format mainly because I hope people can use the YCB_toolbox for evaluation on this dataset without changing the matlab code too much. (2) Please make sure you are using the model pointcloud released by the YCB-Video-dataset. Do not use the model released by other projects, such as the NVIDIA Falling Things. Although they are also using YCB objects, but the canonical frame is different.

The visualiation code I use is modified from the tools/eval_ycb.py within about 30 lines of code. Simply, after each iteration, directly apply the predicted rotation my_r and translation my_t to cld[itemid], then project the transformed model pointclouds to the image frame (No need for fitting in 3d bbox). I'm sure you can get these results.

Fiona730 commented 5 years ago

I found the tiny mistake in my code, and fixing it enabled me to generate the correct result. Also, follow your instructions, I am able to get accurate visualization without using the bbox. Thanks a lot for your effort in giving detailed answers and instructions :)

BarbeBleue commented 5 years ago

I think that I am having a similar problem with my code but I don't succeed to solve it. Some objects (mostly the clamps) don't align well and I have many false positives. I may have a mistake in my code but I don't know where it can comes from. Do you or @Fiona730 have any suggestion ? Do you use the detection or pose estimation confidence level at all ?

Here are some pictures: image_screenshot_24 04 2019_2 image_screenshot_24 04 2019_4

And what I'm doing for the visualization: cam_mat = np.matrix([[cam_fx, 0, cam_cx],[0, cam_fy, cam_cy],[0, 0, 1]]) mat_r=quaternion_matrix(my_r)[0:3,0:3] imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist) open_cv_image = draw(open_cv_image,imgpts.get(),itemid)

Thank you very much for your time

MyoungHaSong commented 5 years ago

@BarbeBleue I have a question in your code imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist) what is the dist?

BarbeBleue commented 5 years ago

Usually the distortion is a characteristic of the camera just like the cam_mat with the intrinsic parameters. You'll find more information here: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html Because the input images don't seem to be affected by it, in my code I supposed it was : dist=np.array([0.0,0.0,0.0,0.0,0.0])

MyoungHaSong commented 5 years ago

@BarbeBleue Thank you for your reply.

I have a few question... so If you don't mind, may I ask you for an email at the email address on my profile?

Fiona730 commented 5 years ago

I have met the same problem, and there is nothing wrong with your code. During training, each object is cropped by the ground truth bounding box annotated in XXXX-meta.mat file in ycb dataset, and the cropped image with the catergory of the object is sent into the network. In test, the bounding boxes are read from mat files in results_PoseCNN_RSS2018, which is the segmentation result of the seg network in PoseCNN. The problem in your visualization is caused by wrong segmentation results, as the network is given a wrong bounding box and a wrong category. Clamps actually contribute to most of the wrong seg results. If you want to get perfect visualization, you can just use gt bboxes in test.

BarbeBleue commented 5 years ago

@Fiona730 ok thank you very much for your answer ! That's what I was thinking of but I was curious about a potential use of detection or pose estimation confidence level.

hillaric commented 5 years ago

@BarbeBleue hello, I am curious about the draw function in your code, can you help me?

huckl3b3rry87 commented 4 years ago

@BarbeBleue I second @hillaric 's question. Can you please provide the draw function? I tried

from PIL import Image, ImageDraw

with

cam_mat = np.matrix([[cam_fx, 0, cam_cx],[0, cam_fy, cam_cy],[0, 0, 1]])
mat_r=quaternion_matrix(my_r)[0:3,0:3]
dist = np.array([[0., 0.0,  0.0, 0.0, 0.0]])
imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist)

but I get this error

open_cv_image = ImageDraw(img,np.squeeze(imgpts),itemid)
TypeError: 'module' object is not callable

or @hillaric did you figure this out?

huckl3b3rry87 commented 4 years ago

I had to learn some opencv

here is the line that worked for me

img = cv2.polylines(np.array(img),np.int32([np.squeeze(imgpts)]),True,(0,255,255))