Open ahundt opened 8 years ago
@bmagyar @tolgabirdal you will probably be interested in this problem, thanks for taking a look.
This can surely happen, especially when you have ambiguous surface structures (which your box definitely has), bad initial poses (I will see when I look through) or insufficient number of points (e.g. Even if you have enough input points to ICP, if you go very coarse in the pyramid, you might end up with very few. Just keep that in mind and try with less number of levels). Also consider the fact that after sampling, if you lose the surface variation, and end up with dominantly sampling the largest plane, then theoretically you have little chances of registering them.
By the way, in such cases, using this method (surface matching) would not be the best option. It is generally designed for applications where surface geometry is descriptive, like CAD models and etc. Because this object is symmetric, even if everything goes right, you have a huge chance of finding it in up-right poses from time to time.
You might as well simply not use ICP for this if the pose output from the detector is sufficient. Another option is to use point to point ICP, since your surface normals are not describing much (they are similar all over the place).
Finally, make sure that your surface normals are there and correct. Not specific to your scenario, but generally this is a common mistake I see.
Still, I will run it and see.
The primary bug I believe should be fixed is that even if the data isn't good enough it should not crash, but rather return an error code, throw an exception, or some other reasonable behavior so that the software using your library can at least deal with the problem even if there are no useful results. There isn't really a way to do that ahead of time.
Thanks for taking a look, from reading the papers I'm also aware boxes aren't exactly ideal for this algorithm but was trying to use something I had that a kinect v2 could capture easily with a good size. I'm not particularly tied to that model and it is easy to change the number of triangles in it with meshlab, which also generated the surface normals. I included both the high and low polygon count versions in the zip file.
Questions:
I've found complex ones time consuming to make and simple boxes I made by hand sometimes only have 6 vertices and are poor fits for the algorithm. I've been reading in models with Autodesk 123d catch using images of real world objects, which is also definitely not perfect.
I'll check the following:
+1 for failing gracefully.
I'm just going to copy my comment over from the other pull request for completeness since it applies to this one as well. The error case could fail by using the OPENCV_ERROR macro, like here: https://github.com/Itseez/opencv/blob/master/modules/videoio/src/cap_qt.cpp#L314
But in order to fail nicely, we need to identify the problematic cases and should lay down some tests to document, showcase and monitor these and the expected behaviour.
A good spot to make the graceful crashing change is at if (selInd), because if that is < 6 it will definitely fail.
If it is possible it would also be nice if the algorithms can be divided into separate functions, one for each of the different major steps in the algorithm, rather than being a single 200 line function. Extra bonus points for references to the equation numbers in the original paper!
I most certainly agree and it should be there for future releases. Please keep in mind that this is a preliminary implementation as part of GSoC. OpenCV decided not to invest in it so much, in the following GSoC. The issues are more than meets the eye. So don't expect a lot from it. Yet, for general cases, it should work as the algorithm is correctly implemented. Having said that, I have the implementation greatly improved already. I will complete the pull request in a timely manner, within my tight schedules.
I have also not integrated the hypothesis verification. So sometimes the correct pose could be not in the first element, but in the subsequent ones.
Regarding your questions:
A. The crash is not only due to insufficient number of points. Sometimes the system really gets ill conditioned and this should be handled. That doesn't stop it from being handled, though.
B. CAD model preparation is indeed crucial for many object detection algorithms. It is not specific to this one. For Kinect like scenarios, you could use Kinect-Fusion to create CAD models.
In general, the CAD models can even be found on the internet. Or for industrial objects, they are also available from manufacturer and etc. If not, you could always 3D print objects. But, keep the following in mind:
1) To make an arbitrary CAD model suitable to be used with this algorithm, you could use remeshing or a triangle sampling. In both cases, distribute the vertices as uniformly as possible (keep in mind that distance quantization requires a certain distance to be present between sample points. This is tied to your relative sampling parameter. I believe this is well explained in the documentation). Keep in mind that sharp edges are generally present in CAD models. Your sampling algorithm should handle this, and should generate correct normals.
In this particular case, choosing the corners of the box isn't really helpful, but having points on all sides is. You can start with Poisson Disk Sampling for example. It is the most basic one, as mentioned here: http://www.tbirdal.me/downloads/birdal_3dv_2015.pdf
2) Accuracy and "reality resemblance" of surface normals are important. If they are too detailed, the reality is missed and shape representation is jeopardized. If they are too smooth, they start not carrying enough information for surface variability. The algorithm has certain tolerance for both the former and the latter, but it has limits to it. For instance, if your sensor is too smooth (like Kinect) and your cad model is very detailed, one trivial approach would be to smooth the CAD model before training.
C. A bad initial pose changes from object to object. For box like cases, if the ICP algorithm is outlier aware and you start from a pure translational shift, you will probably get stuck there. That's because the algorithm already has a matching large portion, and discards the rest. While for a Stanford bunny, you could recover ~30deg rotations.
Also note that the current implementation has certain inefficiency in training stage, which makes it slow. So please be patient to it. This issue is also to be addressed in future releases.
Thanks for the feedback on how to improve the results. We've tried a few objects and aren't getting any accurate matches, plus we run into this issue with an under-determined linear system a lot. I don't quite understand the criteria for points being included in the solver, could you explain that a bit?
From reading the slam++ paper which uses this technique, it seems the object needs to dominate the scene for a match to be found and that is definitely not the case for what we are trying, the ~0.5-1 m minimum accurate distance and wide viewing angle of the kinect v2, and the size of interestingly shaped objects I happen to have around me aren't large enough to fill half of the frame.
Oh, it is also worth noting that we aren't getting good results with the sample data included in the repository.
Andrew,
Thank you for the feedback. These will be considered.
As I mentioned: This is a preliminary implementation. This issues are reported already.
I am not knowledgeable about your dataset, but yes the object should dominate the scene. When this is not the case, you should choose your sampling strategy accordingly. This has huge influence.
I informed OpenCV about all these issues, already, in a proposal. I will try to realize these if I find some time. Moreover, all of these issues are already improved. For an idea about how the upcoming versions would perform, see this video for example: https://www.youtube.com/watch?v=HxV9Ouy-fLM
The sample data should be alright though. We could double-check.
Cheers,
On Thu, Dec 10, 2015 at 1:09 AM, Andrew Hundt notifications@github.com wrote:
Oh, it is also worth noting that we aren't getting good results with the sample data https://github.com/Itseez/opencv_contrib/tree/master/modules/surface_matching/samples/data included in the repository.
— Reply to this email directly or view it on GitHub https://github.com/Itseez/opencv_contrib/issues/464#issuecomment-163442843 .
/tolga
Oh I see now, when I first saw that video I thought it was with the implementation here, but I now see it is for a new paper and improved methodology. My apologies for being dense on that count and thanks for your patience.
On my first reading the paper looks to be quite good! Thanks!
Ah, you're welcome. The results here were generated by the current implementation: https://www.youtube.com/watch?v=uFnqLFznuZU
Cheers,
On Thu, Dec 10, 2015 at 1:40 AM, Andrew Hundt notifications@github.com wrote:
Oh I see now, when I first saw that video I thought it was with the implementation here, but I now see it is for a new paper and improved methodology. My apologies for being dense on that count and thanks for your patience.
On my first reading the paper looks to be quite good! Thanks!
— Reply to this email directly or view it on GitHub https://github.com/Itseez/opencv_contrib/issues/464#issuecomment-163448567 .
/tolga
I have some new data with a much nicer tiki model in a relatively trivial scene for matching that can be found at tiki.zip, but it still encounters the same problem.
Here is my setup:
Here is a screenshot of my object mesh:
Here is a front view of the scene from the kinect:
side view:
As you can see, this object has a very nice surface, even a hole in the middle, and stands out very clearly against an empty background. I've also verified that the mesh/point cloud looks quite nice as well on all visible sides (didn't get the bottom). Nonetheless this runs into the same issue where there algorithm is failing in the same way. I was hopeful this would be a simple enough test case that the algorithm would even work as-is, without your refined version, but unfortunately that doesn't seem to be the case.
Andrew,
Unfortunately, this scene is not good at all. The scene you have, has the normals oriented in the opposite direction as your model. This was my first remark. "Check your normals". They matter. This is the most important thing.
Moreover, compared to the entire scene, this is a quite planar object. It has low geometric variation. It can still work if the normals are corrected. I haven't checked. But it is certainly not ideal. Try Stanford's bunny or something similar. Besides these, the scene has significant missing and at the same time spurious data (This wouldn't be a problem if the surface contained geometric information but it doesn't).
I would recommend reading the algorithm to get an idea of the objects that it is good for. If you have some concerns you could e-mail me personally (tbirdal@gmail.com or tolga.birdal@tum.de) as well, since such comments might make this place less informative and more misleading. We could then post a more informative message so that everyone saves time.
All the bests,
On Sat, Dec 12, 2015 at 1:30 AM, Andrew Hundt notifications@github.com wrote:
I have some new data with a much nicer tiki model in a relatively trivial scene for matching that can be found at tiki.zip https://github.com/Itseez/opencv_contrib/files/59981/tiki.zip, but it still encounters the same problem.
Here is my setup:
[image: tiki_scene_close_setup] https://cloud.githubusercontent.com/assets/55744/11758625/6273fd00-a03b-11e5-8093-e7d688d91665.JPG
Here is a screenshot of my object mesh:
[image: tiki_mesh00] https://cloud.githubusercontent.com/assets/55744/11758629/7080c702-a03b-11e5-875b-782eb47b75aa.png
Here is a front view of the scene from the kinect:
[image: tiki_scene_front_view] https://cloud.githubusercontent.com/assets/55744/11758660/f0e34b90-a03b-11e5-9b9e-8761fb9255f8.png
side view:
[image: tiki_scene_close_side_view] https://cloud.githubusercontent.com/assets/55744/11758666/fef1bb72-a03b-11e5-8f57-6a299e34156f.png
As you can see, this object has a very nice surface, even a hole in the middle, and stands out very clearly against an empty background. I've also verified that the mesh/point cloud looks quite nice as well on all visible sides (didn't get the bottom). Nonetheless this runs into the same issue where there algorithm is failing in the same way. I was hopeful this would be a simple enough test case that the algorithm would even work as-is, without your refined version, but unfortunately that doesn't seem to be the case.
— Reply to this email directly or view it on GitHub https://github.com/Itseez/opencv_contrib/issues/464#issuecomment-164088882 .
/tolga
Sounds reasonable, thanks!
Hello @tolgabirdal,
I came across this thread while I was trying to use this module with HoloLens and trying to find a furniture model in the scanned scene from HoloLens. However, I as understood from your comments above, that would not be the best approach to take?
I already tested it and had very unexpected results. But as I am doing my Masters and running out of time and came across this thread very late, I just want to know do I proceed with investigating or better try to find another approach.
Help and advice is needed and appreciated!! Thank you.
I'm testing surface_matching using ppf_load_match and the following model + scene:
cereal model
cereal scene The relevant ply files are in cereal_scene.zip.
When I run this with the latest master of opencv and opencv_contrib as of about 2015-12-06-1200, minimizePointToPlaneMetric() fails due to
cv::solve()
being passed an under-determined linear system at the linked line. In this case A is 4x6 when it should be at least 6x6 to find a solution, which I believe means there are only 4 point pair correspondences being found.Here is the relevant debug output when run on
cereal_box_accurate_scale_lowpoly.ply
andcerealscene2.ply
:I'm interested in help/feedback to fix this bug, but I'm planning to look into it myself as well. Thanks!