microsoft / singleshotpose

This research project implements a real-time object detection and pose estimation method as described in the paper, Tekin et al. "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018. (https://arxiv.org/abs/1711.08848).
MIT License
721 stars 214 forks source link

Preparing label file for our own training data #21

Closed eyildiz-ugoe closed 6 years ago

eyildiz-ugoe commented 6 years ago

I am going to try out some images that contain a holepuncher which is identical to the one in LINEMOD. However, for this to work, I need to have its label file as the requirements state:

(2) a folder containing label files (labels should be created using the same output representation explained above),

I am bit confused here. How and why to have its label file in the first place? I am only trying to test the detection here. I mean, if I need to create a label file to detect an object and if I need to enter 21 values in that file, pretty much I do the all the work anyway, what is there for the code to do. Moreover, the output representation talks about "prediction of values" so that also kind of confuses me. What is there for me to enter if the system "predicts"?

I guess I am a bit lost here. Could someone enlighten me a bit?

btekin commented 6 years ago

If you are going to test on one of the objects that already appears on the training dataset, then you don't really need to retrain again. You can test the model trained for the holepuncher object directly on your test images. If the viewpoint and illumination conditions are not so different, the model should, in principle, generalize to the new test images.

Therefore you don't need to retrain and create a label file again. That section describes how to do training on another custom dataset. If you are going to train on another dataset, then you need to create these labels for the network to learn.

eyildiz-ugoe commented 6 years ago

Yeah but then why does the validation code valid.py give an error when I try to test with the image I provide? It looks for my label file and since it is empty, it returns an error, or at least that's how I think it fails. Nevertheless here is the output I am getting:

2018-07-25 10:04:37    Testing holepuncher...
2018-07-25 10:04:37    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001352
         predict : 0.562799
get_region_boxes : 0.019107
            eval : 0.000120
           total : 0.583378
-----------------------------------
2018-07-25 10:04:37 Results of holepuncher
2018-07-25 10:04:37    Acc using 5 px 2D Projection = 0.00%
2018-07-25 10:04:37    Acc using 10% threshold - 0.0162 vx 3D Transformation = 0.00%
2018-07-25 10:04:37    Acc using 5 cm 5 degree metric = 0.00%
2018-07-25 10:04:37    Mean 2D pixel error is nan, Mean vertex error is nan, mean corner error is nan
Traceback (most recent call last):
  File "valid.py", line 309, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 279, in valid
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
ZeroDivisionError: float division by zero
eyildiz-ugoe commented 6 years ago

Should one also change the intrinsic parameters in the code? I've noticed that there is a K matrix used in utils.py, should we also enter our own values there?

btekin commented 6 years ago

Labels in testing are just used to numerically evaluate the accuracy. For the validation code we provide, we numerically evaluate the accuracy of our approach on the LINEMOD dataset for which we have the ground-truth labels for the validation images.

For the test image you have, since you don't have the ground-truth labels, you can't really quantitatively evaluate the accuracy. Yet you can still predict the rotation and translation and qualitatively visualize your predictions. You can therefore just delete that part of the numerical evaluation code for your own purposes.

Intrinsic calibration matrix, K, is assumed to be known, therefore it is true that you should enter your own values.

eyildiz-ugoe commented 6 years ago

I see. Alright then, since I am for now only interested in seeing some preliminary results from real scenes of LINEMOD objects (such as the holepuncher I have), I can indeed leave that part out.

I have now left the following out:

# Print test statistics
    logging('Results of {}'.format(name))
    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
    logging('   Acc using 10% threshold - {} vx 3D Transformation = {:.2f}%'.format(diam * 0.1, acc3d10))
    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))
    logging("   Mean 2D pixel error is %f, Mean vertex error is %f, mean corner error is %f" % (mean_err_2d, np.mean(errs_3d), mean_corner_err_2d))
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )

    if save:
        predfile = backupdir + '/predictions_linemod_' + name +  '.mat'
        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})

and the program runs. However, I get no results, as in no output is shown. Does this mean the detection failed?

btekin commented 6 years ago

The current form of valid.py function only returns the average accuracy numbers for the LINEMOD dataset and does not output anything else. Now that you left the evaluation part of the code out, you don't see any output, this doesn't necessarily mean that the detection failed.

You can still save out the rotation and translation predictions (R_pr, t_pr) and visualize your predictions using the jupyter notebook we provide.

eyildiz-ugoe commented 6 years ago

Yeah, I copied the visualization part from the notebook to valid.py, so I can save the images and look at them after I run the program. This works with the images from the dataset LINEMOD. However now, I get no output.

This should be interpreted as either:

or

Not sure which is valid at this point but yeah, just wanted to point it out. Have you tried the program with the images that were not in the dataset?

btekin commented 6 years ago

You mean you can visualize the image but not the 6D pose prediction? Did you make sure that the visualization works properly when you run valid.py with matplotlib imported? Can you make sure if the 2D corner predictions are reliable (because that part doesn't rely on K)?

You should set K properly to get reliable 6D pose predictions, without making sure that you have the correct K, it is hard to say if the detection failed or not. I haven't tested the method on images that are not in the dataset. However, to increase the generalization ability of the approach to unseen images, we randomly change the background of the object with images from PASCAL VOC. Therefore, I would imagine that 2D control point predictions should be reasonably good for unconstrained images.

eyildiz-ugoe commented 6 years ago

I can visualize the detection and estimation of LINEMOD images with valid.py since I copied the visualization functions from the notebook. However, it does not show anything when I try with an image that is not in the dataset. I've provided the mask too, but no luck.

In order to bypass the error regarding the evaluation I had this part commented out:

# Print test statistics
    logging('Results of {}'.format(name))
    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
    logging('   Acc using 10% threshold - {} vx 3D Transformation = {:.2f}%'.format(diam * 0.1, acc3d10))
    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))
    logging("   Mean 2D pixel error is %f, Mean vertex error is %f, mean corner error is %f" % (mean_err_2d, np.mean(errs_3d), mean_corner_err_2d))
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )

    if save:
        predfile = backupdir + '/predictions_linemod_' + name +  '.mat'
        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})

So the program runs and gives the following output:

2018-07-25 14:40:51    Testing holepuncher...
2018-07-25 14:40:51    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001355
         predict : 0.558806
get_region_boxes : 0.018954
            eval : 0.000119
           total : 0.579234
-----------------------------------

and no image is saved. Normally, it saves the detections when it works with the LINEMOD images. With this though, nothing is happening in terms of visualization, which leaves me in doubt.

btekin commented 6 years ago

So just to be clear: you can visualize the input image that you give but you can't see any 3D bounding box projection on the image? Is that correct? Or, you don't see even the input image?

Did you make sure that you have the correct calibration matrix? If you don't have the correct K, PnP would return wrong results.

Did you also try visualizing just the initial 2D control point predictions of the network? This part doesn't rely on K and helps to check if the network detection fails or the problem comes from K.

eyildiz-ugoe commented 6 years ago

No I can't visualize anything if I provide an image out of the dataset. It behaves like it runs and does nothing. No output figure is shown/saved. It only prints what I've posted above.

I do not have the correct K matrix but what you suggest makes sense, how to visualize the control point predictions?

btekin commented 6 years ago

Not being able to visualize the input image itself is a bit strange, it suggests that there might be something wrong with the visualization. You might need to change matplotlib's backend. The discussion here might be helpful: https://stackoverflow.com/questions/7534453/matplotlib-does-not-show-my-drawings-although-i-call-pyplot-show

You could use the scatter function from matplotlib (https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html) to visualize control point predictions.

simicvm commented 6 years ago

@btekin thanks for the code. Can you please clarify a bit what are the last two numbers in the label, x range and y range. Readme file says

To encode the size of the objects, we have additional 2 numbers for the range in x dimension and y dimension

What do they represent, and how can we calculate them for custom objects?

btekin commented 6 years ago

They represent the width and height of a 2D rectangle tightly fitted to a masked region around the object. If you have the 2D bounding box information (e.g. width and height) for the custom object that you have, you can use those values in your label file. In practice, however, we fit a tight bounding box to the 8 corners of the projected 3D bounding box and use the width and height of that bounding box to represent these values.

Similarly to YOLOv2, these values are used to determine which anchor box is going to be used to estimate the pose of the object. The anchor box which has the most similar size and aspect ratio to the current object (this is measured with IoU), is used during training to estimate the pose of that object. In this case, each anchor box is responsible for certain sizes and aspect ratios. As also stated in the first YOLO paper, this leads to specialization between the bounding box predictors. Each predictor gets better at predicting certain sizes and aspect ratios.

Note that, in the current version of the code, we use these values and the anchor boxes, only during multi-object pose estimation and not for single object pose estimation part.

simicvm commented 6 years ago

Thanks for the clarification. A followup question if you don't mind. Anchor size is normalized to the grid cell size, which is 32x32 pixels. Anchor you chose for the single object detection is 0.1067, 0.9223, which would make it around 4x30 pixels (for 416x416 image). Any particular reason for that narrow shape? This object size in label file is normalized to the original image size, same as coordinates?

btekin commented 6 years ago

For single object pose estimation, we don't use anchors. That was a stray variable which is not being used. Now I updated the repo and removed the stray anchor values for single object pose estimation part of code.

The object size is indeed normalized to the original image size in label files.

btekin commented 6 years ago

@eyildiz-ugoe Could you resolve the problem with the visualization? Please let me know if you could. I'm closing the issue for the moment, but please feel free to reopen if you have further questions about it.

zhanghui-hunan commented 6 years ago

@eyildiz-ugoe and @btekin thanks for your contribution. Now I want to prepare label file for my own training data, too. However, a folder containing label files which i don't know how to get it and correspond to my image, i wanna know do you have any tool such as 3d object annotation tool that i can label easily or other methods can make it. Another question is how do you get the .ply file, are you labeling based on the .ply file? Sincerely thank you and look forward to your reply!

eyildiz-ugoe commented 6 years ago

@liangzhicong456 .ply file is the 3d model of your object. You either scan your object with a 3d scanner and generate it, or you find/download them from 3d model databases, or you create them by using 3d model software.

They discussed the labeling issue in https://github.com/Microsoft/singleshotpose/issues/37 I guess.