Difference between your implementation and the actual paper's?

mittalrajat commented 6 years ago

Hi @smallcorgi,

Thanks for providing us with your code. As I look through your code and look at the issues section, I see that there are some differences between the actual implementation and your code. Would it be possible for you to list down the differences as I feel that it would be really for those who use your code for their implementation. Thanks.

Some differences that I observed are:

Instead of proposing several 2D boxes for 3D box estimation, you use the 2D boxes from the ground truth dataset.
Instead of estimating the 3D position, you use the ground truth 3D position from the dataset.
During testing, you seem to be giving only the image patch containing the object as input. Shouldn't we give the entire image as input and expect the network to regress the 3D bounding boxes from the entire image.

I am currently trying to understand the paper, so I apologise if things that I suggested turn out to be incorrect.

Thanks again.

smallcorgi commented 6 years ago

Hi @mittalrajat , Yes, you are right.

345ishaan commented 6 years ago

@mittalrajat The essence of this paper is not end to end. So either you need to pass the entire image to a 2D detector and then use the outputs of it and use the trained model to get the dimensions and yaw. Correct me if I am wrong.

mittalrajat commented 6 years ago

@345ishaan Yes, that seems correct.

lucasjinreal commented 5 years ago

@345ishaan That would be very slow to generate every 3d box for every detected image patch.

smallcorgi / 3D-Deepbox

Difference between your implementation and the actual paper's? #13