Closed Fiona730 closed 5 years ago
Hi, this problem is mainly because we are using one network to tackle all objects which are trained with different loss functions (ADD for non-symmetrical objects, ADD-S for symmetries like the bowl in your case), which might lead to some unstable performance on one or two specific objects but the overall mean score among all the objects is stable. To tackle this issue, like how we get the performance in the paper Fig.4, my suggestion would be to separate the training of symmetry and non-symmetry objects so that the performance would not be affected by the different scale of the loss.
Thanks for your quick reply!
However, the trained model I downloaded didn't perform well in many visualization cases. I am not able to evaluate the result with YCB toolbox (because Matlab is unavailable on the server), but the wrong estimation rate in test frames is roughly 40%. So I suppose symmetry is not the main problem. Some other visualization results are provided below. Do you have any other idea on this?
Hi @Fiona730, I'm quite sure that there must be some bugs in your visualization code. Here is the result I re-run the released checkpoints on all the testing videos of YCB dataset. As you can see, there are no such errors like yours. And there is no need for you to transform those points to fit into the object's tight bounding box. Without fitting, it's already good enough.
Although I do not know your code, there are several of my guessings:
(1) Please make sure you acquire the correct pose of the object you are visualizting. The evaluated pose saved in each frame XXXX.mat
is not ranked according to the index order. It is followed by the detection result released by PoseCNN (See YCB_Video_toolbox/results_PoseCNN_RSS2018/xxxxxx.mat
the second row of the tag rois
). I follow their format mainly because I hope people can use the YCB_toolbox for evaluation on this dataset without changing the matlab code too much.
(2) Please make sure you are using the model pointcloud released by the YCB-Video-dataset. Do not use the model released by other projects, such as the NVIDIA Falling Things
. Although they are also using YCB objects, but the canonical frame is different.
The visualiation code I use is modified from the tools/eval_ycb.py
within about 30 lines of code. Simply, after each iteration, directly apply the predicted rotation my_r
and translation my_t
to cld[itemid]
, then project the transformed model pointclouds to the image frame (No need for fitting in 3d bbox). I'm sure you can get these results.
I found the tiny mistake in my code, and fixing it enabled me to generate the correct result. Also, follow your instructions, I am able to get accurate visualization without using the bbox. Thanks a lot for your effort in giving detailed answers and instructions :)
I think that I am having a similar problem with my code but I don't succeed to solve it. Some objects (mostly the clamps) don't align well and I have many false positives. I may have a mistake in my code but I don't know where it can comes from. Do you or @Fiona730 have any suggestion ? Do you use the detection or pose estimation confidence level at all ?
Here are some pictures:
And what I'm doing for the visualization:
cam_mat = np.matrix([[cam_fx, 0, cam_cx],[0, cam_fy, cam_cy],[0, 0, 1]])
mat_r=quaternion_matrix(my_r)[0:3,0:3]
imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist)
open_cv_image = draw(open_cv_image,imgpts.get(),itemid)
Thank you very much for your time
@BarbeBleue I have a question in your code imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist) what is the dist?
Usually the distortion is a characteristic of the camera just like the cam_mat with the intrinsic parameters. You'll find more information here: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
Because the input images don't seem to be affected by it, in my code I supposed it was :
dist=np.array([0.0,0.0,0.0,0.0,0.0])
@BarbeBleue Thank you for your reply.
I have a few question... so If you don't mind, may I ask you for an email at the email address on my profile?
I have met the same problem, and there is nothing wrong with your code. During training, each object is cropped by the ground truth bounding box annotated in XXXX-meta.mat file in ycb dataset, and the cropped image with the catergory of the object is sent into the network. In test, the bounding boxes are read from mat files in results_PoseCNN_RSS2018, which is the segmentation result of the seg network in PoseCNN. The problem in your visualization is caused by wrong segmentation results, as the network is given a wrong bounding box and a wrong category. Clamps actually contribute to most of the wrong seg results. If you want to get perfect visualization, you can just use gt bboxes in test.
@Fiona730 ok thank you very much for your answer ! That's what I was thinking of but I was curious about a potential use of detection or pose estimation confidence level.
@BarbeBleue hello, I am curious about the draw function in your code, can you help me?
@BarbeBleue I second @hillaric 's question. Can you please provide the draw function? I tried
from PIL import Image, ImageDraw
with
cam_mat = np.matrix([[cam_fx, 0, cam_cx],[0, cam_fy, cam_cy],[0, 0, 1]])
mat_r=quaternion_matrix(my_r)[0:3,0:3]
dist = np.array([[0., 0.0, 0.0, 0.0, 0.0]])
imgpts, jac = cv2.projectPoints(cld[itemid], mat_r, my_t, cam_mat, dist)
but I get this error
open_cv_image = ImageDraw(img,np.squeeze(imgpts),itemid)
TypeError: 'module' object is not callable
or @hillaric did you figure this out?
I had to learn some opencv
here is the line that worked for me
img = cv2.polylines(np.array(img),np.int32([np.squeeze(imgpts)]),True,(0,255,255))
Hello, thanks for sharing your code!! I tried to visualize the output on YCB test set, but the result doesn't not align with that in Fig 4. in your paper. Here is one of my visualization results. The up left image is generated using the ground truth R and T in xxx-mata.mat, and it is correct. The down left and down right images are generated by simply replacing the R and T by those in mat files in result_wo_refine_dir and result_refine_dir. Pose estimation in these two images seems to go wrong.
The visualization process is to transfer points in points.xyz with R and T and then scale and transform those points to fit into the object's tight bounding box.
I used trained checkpoints provided by you.
May I ask your suggestion on what the problem might be? Thanks for your time.