microsoft / singleshotpose

This research project implements a real-time object detection and pose estimation method as described in the paper, Tekin et al. "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018. (https://arxiv.org/abs/1711.08848).
MIT License
721 stars 215 forks source link

How to label my own dataset #37

Closed JinwenJay closed 5 years ago

JinwenJay commented 6 years ago

Thanks for providing this amazing work!

I am trying to make a dataset which follows the type as you introduced in README, however, I am wondering whether exist any tool can help me with making .ply files and label files.

Thank you again anyway.

btekin commented 6 years ago

Thank you for your feedback on our code!

To create .ply files you can take a look at the tools listed in this website: http://people.sc.fsu.edu/~jburkardt/data/ply/ply.html .

Label files are just .txt files, you can write the relevant 3D bounding box, class and size information into .txt files using a simple python or matlab script.

JinwenJay commented 6 years ago

I am very appreciate your quick response!

Yes, I knew the label files are just some .txt files filled with 3d bounding box coordinates. However, how can I label them with correct numbers? For example, the centroid. I can only label it with my intuition without related tool. But it is not the best method. Same situation for other 3d coodinates. In a short, I can't give a convinced 3d coordinates to the cube'vertex.

btekin commented 6 years ago

What we do for corner points is to find the 3D bounding box surrounding the 3D object model and project these corner points in 2D. For centroid, we take the (0, 0, 0) position of the object model and again project it in 2D. Do you have a 3D object model for your data?

zhanghui-hunan commented 6 years ago

soap.zip @btekin and @eyildiz-ugoe I have been uploaded .ply and the original image, can you tell me how to label it and get the labels file. Because i can't find any way to get the 21 numbers and the numbers means length?and the unit is cm? Sincerely thanks!

edisonchensy commented 6 years ago

have you solve your problem? i got same problem, do you have wechat or qq so we can discuss it

eyildiz-ugoe commented 6 years ago

@liangzhicong456 I haven't labeled my own data yet, I am also curious about how the whole process is done. It would be nice to have a straightforward method to come up with these 21 numbers per image.

edisonchensy commented 6 years ago

@btekin can you please tell us how to label this 21 numbers? (labeling 3D bounding box coordinates on 2D images) , what information is needed?

thank you very much

eyildiz-ugoe commented 6 years ago

@btekin I think it's a legit concern now since many of the people aren't interested in rubber ducks, holepunchers and etc. from the LINEMOD dataset. They want to try the method on their OWN datasets, which requires labeling of the images, which isn't clear to anyone apparently.

And this does not help too much to be honest:

(2) a folder containing label files (labels should be created using the same output representation explained above),

Suppose we have the 3D model (.ply) and a 2D image of an object, how do we come up with its respective 21 numbers?

PeterZheFu commented 6 years ago

I think there are two ways to label new datasets. The first is to render synthetic training examples in opensource softwares like Blender, then you will have the groundtruth 3D bounding box location in rendered image plane.

The second method is to put a QR code marker (that is well aligned with your object, i.e. sofa), and capture images of the real object. Then you can use the camera parameters and QR code marker to get the object pose. Then you can have the chair 3D model in meters scale and get the bounding box corner coordinates in image plane.

JinwenJay commented 6 years ago

@PeterZheFu, thank you for your reply! Yes, I was plan to use Blender to make a model in the format of .ply, but what trouble me is how to project 3D bounding box to image coordinate. According to your reply, it seems the projected 2D coordinates can be obtained by Blender. Could you explain it specifically? Thank you again.

PeterZheFu commented 6 years ago

@JinwenJay You are welcome. You can check this web page for the details: https://blender.stackexchange.com/questions/882/how-to-find-image-coordinates-of-the-rendered-vertex/1008#1008

btekin commented 6 years ago

The whole process to obtain the label files is as follows:

  1. Get the 3D bounding box surrounding the 3D object model. We use the already provided 3D object model for the LINEMOD dataset to get the 3D bounding box. If you would like to create a 3D model for a custom object, you can refer to the Section 3.5 of the following paper and the references therein: http://cmp.felk.cvut.cz/~hodanto2/data/hodan2017tless.pdf

  2. Define the 8 corners of the 3D bounding box and the centroid of the 3D object model as the virtual keypoints of the object. 8 corners correspond to the [[min_x, min_y, min_z], [min_x, min_y, max_z], [min_x, max_y, min_z], [min_x, max_y, max_z], [max_x, min_y, min_z], [max_x, min_y, max_z], [max_x, max_y, min_z], [max_x, max_y, max_z]] positions of the 3D object model, and the centroid corresponds to the [0, 0, 0] position.

  3. Project the 3D keypoints to 2D. You can use the compute_projection function that we provide to project the 3D points in 2D. You would need to know the intrinsic calibration matrix of the camera and the ground-truth rotation and translation to project the 3D points in 2D. For some comments about ground-truth acquisition, please refer to https://github.com/Microsoft/singleshotpose/issues/37#issuecomment-424067478

  4. Compute the width and height of a 2D rectangle tightly fitted to a masked region around the object. If you have the 2D bounding box information (e.g. width and height) for the custom object that you have, you can use those values in your label file. In practice, however, we fit a tight bounding box to the 8 corners of the projected 3D bounding box and use the width and height of that bounding box to represent these values.

  5. Create an array consisting of the class, 2D keypoint location and the range information and write it into a text file. The label file is organized in the following order. 1st number: class label, 2nd number: x0 (x-coordinate of the centroid), 3rd number: y0 (y-coordinate of the centroid), 4th number: x1 (x-coordinate of the first corner), 5th number: y1 (y-coordinate of the first corner), ..., 18th number: x8 (x-coordinate of the eighth corner), 19th number: y8 (y-coordinate of the eighth corner), 20th number: x range, 21st number: y range.

edisonchensy commented 6 years ago

@btekin Could you please tell me how to use compute_projection function to project own object 3D points in my own 2D images? Am i going to write a python script? or some other method? Thank you very much

btekin commented 6 years ago

You just need to call the function with your input parameters within your program. Please see https://github.com/Microsoft/singleshotpose/blob/master/valid.py#L206 for an example of input parameters. Also refer to valid.py to see how to create input parameters (3D vertices, Rt transformation matrix and intrinsic matrix).

edisonchensy commented 6 years ago

@btekin i am sorry to bother you again because i think i am still confused with training own dataset part. Currently, I got these information:

  1. jpeg images of target object
  2. the object's 2d bounding box coordinates in the 2d jpeg images (four (x,y))
  3. a mesh model of the object without height, width, length information (.ply)
  4. export XYZ point cloud without normal by using meshLab (.txt)
  5. the height, width, length information by using hand measure
  6. camera intrinsic

By using these information, how to obtain 3D vertices, Rt transformation matrix and is it possible to get the 21 num labels by calling the function you mentioned in valid.py?

best wishes

btekin commented 6 years ago

Since you have XYZ point cloud, you have the 3D vertices. But you need to have the ground-truth Rt transformation matrices for each corresponding image to project your vertices into 2D. Obtaining ground-truth Rt transformation matrices is not a very straightforward task and requires some manual and intrusive annotation effort. As @PeterZheFu mentioned, people use QR code markers in the scene (that is well aligned with their object, e.g. sofa), and capture images of the real object. Then they use the camera parameters and QR code marker to get the ground truth object pose (Rt matrices). This is for example how the LINEMOD dataset is acquired. For an arbitrary scene without markers, it is not easy to get ground-truth poses (Rt) from a single image, but within a controlled environment with markers or a specialized acquisition system, you can collect your own training data with ground-truth poses. In our work, we rely on a ready training dataset and do not address the problem of acquiring training data with ground-truth object poses.

edisonchensy commented 6 years ago

Thank you very much for your explanation. By the way, for the backup = backup/ape in ape.data, i noticed it is a .weight file. How do I get the weight of my object? (I did not pre-train). So, what .weight file can I use for my own object? @btekin

Best wishes

btekin commented 5 years ago

You can just use the weights obtained by a network that does image classification on ImageNet. You can download these weights provided by the authors of the YOLO paper in the following link: https://pjreddie.com/media/files/darknet19_448.conv.23 You can directly use these weights as an initialization to train your network. If you want to also apply pretraining on your specific data, you can follow the steps in https://github.com/Microsoft/singleshotpose#pretraining-the-model-optional

edisonchensy commented 5 years ago

i am sorry to bother you again, I am still confused with the weight part. I did download the conv.23. In the back up file, the data you have (for example ape) has init.weight and backup_model.weight. In the explanation, you said :

get the initialization weights yourself, you can run the following:

python train.py cfg/ape.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23 cp backup/ape/model.weights backup/ape/init.weights

However, in current situation, for example, I want to train my object(apple), In my apple.data I write everything except backup = because I create a file(apple) in backup file and it is empty
i dont have apple init.weight and its model.weight. How do i get the .weight file for my object?(for example apple)

by the way, for the RT_gt , is this RT in camera coordinate system? Please give me a help, thank you very much @btekin

Best wishes

edisonchensy commented 5 years ago

Also, When I collected the my object image, I collected the current translation and rotation information(t=(x,y,z), R= (x,y,z,w)) of my object in camera coordinate system. Are these RT information equals to the R_gt, t_gt in your valid.py code? @btekin Please give me a help, thank you very much

Best wishes

btekin commented 5 years ago

i am sorry to bother you again, I am still confused with the weight part. I did download the conv.23. In the back up file, the data you have (for example ape) has init.weight and backup_model.weight. In the explanation, you said :

get the initialization weights yourself, you can run the following:

python train.py cfg/ape.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23 cp backup/ape/model.weights backup/ape/init.weights

However, in current situation, for example, I want to train my object(apple), In my apple.data I write everything except backup = because I create a file(apple) in backup file and it is empty i dont have apple init.weight and its model.weight. How do i get the .weight file for my object?(for example apple)

Once you run the first command, that is,

python train.py cfg/ape.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23

this will start the pretraining. Once the pretraining is over (this will take some time, maximum number of epochs (max_epochs) is set to 700, but if you see that your training has a good convergence on your data, you can pretrain for less number of epochs), the learned model will be saved to a file called "[BACKUP_PATH]]/model.weights", e.g. "backup/ape/model.weights"

Once you run the second command, that is,

cp backup/ape/model.weights backup/ape/init.weights

The model.weights will be copied to init.weights and init.weights will be your initialization weights, this is how you will get the initialization weights.

by the way, for the RT_gt , is this RT in camera coordinate system? Also, When I collected my object images, I collected the current translation and rotation information(t=(x,y,z), R= (x,y,z,w)) of my object in camera coordinate system. Are this RT information equals to the R_gt, t_gt in your valid.py code?

In 6D object pose estimation R and t are defined as the rotation matrix and translation vector from the object (world) coordinate system to the camera coordinate system. R_gt and t_gt, respectively refers to them.

F2Wang commented 5 years ago

For anyone who wants to create a customized dataset for a single object and has a realsense RGB-D camera, I published my codes to obtain an object mesh and create labels that you may find useful: https://github.com/F2Wang/ObjectDatasetTools

TDBTECHNO commented 5 years ago

I would like to ask that is there any labelling tool for your own custom dataset?I would like to use RGB camera. LIke for 2D images, we labell the images thhrough LabelImg, but what about 3D images or say bounding box?

G-YY commented 5 years ago

I would like to ask that is there any labelling tool for your own custom dataset?I would like to use RGB camera. LIke for 2D images, we labell the images thhrough LabelImg, but what about 3D images or say bounding box?

Have you sloved the problem and how do you label the 3d bbox?

TDBTECHNO commented 5 years ago

I am using unreal engine using Nvidia data synthesizer. To create a json file like ycb and line dataset

danieldimit commented 4 years ago

In case someone needs to annotate real dataset manually, I've developed a free web tool that does that - http://annotate.photo/

satpalsr commented 2 years ago

Hey @danieldimit, I was trying the demo for 3D on 2D annotations but when I label it, it shows 0,1,2,3 & on the next click everything goes away. Can you please look into it or suggest some other tool?