yuxng / PoseCNN

A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
https://rse-lab.cs.washington.edu/projects/posecnn/
MIT License
756 stars 243 forks source link

Projection Scaling issue #55

Open TotoLulu94 opened 6 years ago

TotoLulu94 commented 6 years ago

Hi @yuxng ,

I tried to test poseCNN on my own images, I have strange results : figure_2

The poses seem to be well estimated but it seems that there is a scaling issue. If I did understand your paper, this probably come from the fact that the bounding boxes (shown with the labels) are mispredicted. But I don't really get why. To record my images, I also used a Asus Xtion Pro Live RGB-D camera with the following intrinsic matrix : " [[528, 0, 320] [0, 528, 240], [0, 0, 1]] " I then tied to test it with another intrinsic matrix (as if I was working with a 1280x960 image) : figure_1

The projections look better but the estimated translation vector looks off (just by considering the bleach and mustard). The intrinsic matrix used for this is : " [[978.9, 0, 320], [0, 978.9, 240], [0, 0, 1]] "

Do you have an Idea about what would cause this ? Is my first intrinsic matrix right and I am just missing a scaling factor ? Or is my 2nd matrix right and the estimation of the bleach pose is just bad ?

Also, these estimations have been made without ICP refinement.

yuxng commented 6 years ago

The network is trained with the specific intrinsic matrix in the YCB_Video dataset. So the distance of the object that the network predicts is related to that intrinsic matrix. You can try to visualize the results using that intrinsic matrix.

TotoLulu94 commented 6 years ago

Using the same intrinsic matrix as in demo.py, I have a result comparable to my second case. I guess that sense since the intrinsic matrix of my camera is different. Thanks for you reply !

TotoLulu94 commented 6 years ago

I am going to retrain the network using my own intrinsic matrix but I have another question regarding that : Since the intrinsic matrix of my camera and the intrincix matrix of the camera used to capture the dataset, how will that affect the results ?

Kaju-Bubanja commented 6 years ago

You might have missed a verb or there is a typo, but your question doesn't make sense to me @TotoLulu94 Although I guess you were asking if it is possible to retrain with owns own intrinsic matrix in case the training dataset got recorded with that camera? Is that your question? If yes I would also be interested in the answer.

TotoLulu94 commented 6 years ago

Yep that's right :

aditya2592 commented 5 years ago

@TotoLulu94 I am seeing a similar scaling issue as you have mentioned when using a Kinect with a different intrinsic matrix. The predicted translation vector seems to be off by 10-20 cm. Did retraining on YCB video dataset with your own intrinsic matrix solve your problem ?