skhadem / 3D-BoundingBox

PyTorch implementation for 3D Bounding Box Estimation Using Deep Learning and Geometry
MIT License
435 stars 96 forks source link

how to train on custom datasets #17

Open nullxjx opened 4 years ago

nullxjx commented 4 years ago

I have my own datasets whose format are like pascal voc with labeled images only, i do not have the calibration file, how can I train on my own datasts?

liulingfeng123 commented 3 years ago

@njuxjx hi,I have the same problem, have you found a solution?

Joywalker commented 2 years ago

I have trained this model on a different dataset than the one used in this repo. For your custom dataset you'll need labelled regions and some more info on those labelled objects, more precisely you'll need :

Bounding Box 2D W, H, L of the object 3D Yaw Angle Camera Intrinsic Matrix

When loading the dataset and preparing it for training, you'll have to compute the local orientation angle of the object, that can be done by finding the arctan(x,z) where X and Z are 3D coordinates of the 3D bounding box center. Then, to find alpha (local orientation) you'll simply do : yaw - arctan(x,z)

komzy commented 2 years ago

@Joywalker thank you for clarifying. I have a few questions, I'd be grateful if you can answer:

W, H, L of the object Do you mean pixel values of W,H,L or the real-life object dimensions?

3D Yaw Angle How did you obtain this? Is alpha and the 3D yaw angle the same thing?

Also what is the label file format that you used to save these values for training?

Joywalker commented 2 years ago

3D Yaw Angle How did you obtain this? Is alpha and the 3D yaw angle the same thing?

image

Theta (angle with red) represents the Global Orientation which is the Yaw angle in global plane. Theta ray and Theta l can be computed, as I've explained in the previous comment.

Check out this article, it explains the whole process.

W, H, L of the object Do you mean pixel values of W,H,L or the real-life object dimensions?

That would be 3D real-size dimensions of the object. When preparing the data for training, the average dimension substracts the real W H L of the object from the Average W H L of that object, that helps to define the 3Ddimensions offsets when computing the 3D box.

komzy commented 2 years ago

Thanks!

jamesheatonrdm commented 1 year ago

Hi @Joywalker I am also using my own custom dataset and have all the information you describe above.

I am looking at these comments and am confused as to what angle is what you call 'alpha'. Is this theta l?

Also, just looking at the geometry should the local orientation (theta ray) not be equal to arctan(z/x)? I am also confused why you seemed to have described the arctan function having two arguments (you have written arctan(x, z)).

EDIT -------------------------------------------- EDIT

Apologies, I have just been reading the attached article, and the arctangent function makes sense after reading this. However this article shows that the ray angle is calculated wrt. the principal point of the camera, whereas in the diagram you have attached above it is not taken wrt. the principal point, but wrt. the x axis which is causing me some confusion.