April Tags and depth image.

jmirabel commented 4 years ago

Dear ViSP developpers,

In our project, we use the AprilTag wrapper of ViSP (v3.2.0) to get the pose of some AprilTag with respect to the camera. From time to time, the pose estimation is stuck in a local minima due to an ambiguity when the tag is close to being orthogonal to the camera normal. I tried LAGRANGE_VIRTUAL_VS and DEMENTHON_VIRTUAL_VS.

Is there any solution to this issue in ViSP ? One way of removing the ambiguity is to use the depth information. Is it possible to provide depth information to ViSP ?

Can I expect the method HOMOGRAPHY_VIRTUAL_VS in ViSP 3.2.1 to behave better ?

fspindle commented 4 years ago

The best would be to attach and image with the tag and the corresponding camera parameters that produce the issue. I could then investigate...

jmirabel commented 4 years ago

Thank for your quick reply. You can find the image here: https://github.com/agimus/agimus-vision/issues/5

The camera parameters are read from a ROS topic and converted into ViSP camera parameters (https://github.com/agimus/agimus-vision/blob/122c40daf4a0df3c05fe113412bb82220cba1f65/src/tracker_object/node.cpp#L134). The topic contains:

height: 480
width: 640
distortion_model: "plumb_bob"
D: [0.1342682457423161, -0.185169495024767, 0.008633872454571442, -0.006708097638208048, 0.0]
K: [606.927873247237, 0.0, 317.0942192234208, 0.0, 608.7315077548897, 255.8098355816944, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [625.0511474609375, 0.0, 313.3595109513262, 0.0, 0.0, 626.72314453125, 259.4014303951117, 0.0, 0.0, 0.0, 1.0, 0.0]
binning_x: 0
binning_y: 0
roi: 
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False

jmirabel commented 4 years ago

Note that in the case of agimus/agimus-vision#5, the ambiguity is correctly resolved by replacing

LAGRANGE -> LAGRANGE_VIRTUAL_VS
DEMENTHON -> DEMENTHON_VIRTUAL_VS

But it does not always solve the problem. I will try to generate another image whenever I bump again into the issue although not always easy.

fspindle commented 4 years ago

There is tutorial-apriltag-detector binary that could be used to test.

When I create camera.xml that contains you intrinsic parameters:

$ more camera.xml 
<?xml version="1.0"?>
<root>
  <camera>
    <name>Camera</name>
    <image_width>640</image_width>
    <image_height>480</image_height>
    <model>
      <type>perspectiveProjWithoutDistortion</type>
      <px>606.927873247237,</px>
      <py>608.7315077548897</py>
      <u0>317.0942192234208</u0>
      <v0>255.8098355816944</v0>
    </model>
  </camera>
</root>

and launch the binary over the first image in agimus/agimus-vision#5 using:

$ ./tutorial-apriltag-detector --input /Users/fspindle/Desktop/img-tag.png --intrinsic /Users/fspindle/Desktop/camera.xml --camera_name Camera --tag_size 0.1

I got results that sounds ok to me

Isn't that right?

s-trinh commented 4 years ago

If you are experiencing z-axis flipping, it is inherent to the planar pose estimation ambiguity. When the tag is small in the image or when the corners locations are extracted poorly. See the following image:

It is also described in the Dementhon paper or in the Aruco doc.

There is a strange image artifact for the top tag that most likely led to the pose issue (the z-axis is not pointing outward the tag).

With latest version of ViSP, it is possible to get the two poses and the corresponding reprojection errors, see detect(). Temporal filtering could be used to try to determine the correct pose (something like choose the solution whose orientation is the closest to the previous one). Also, the estimated poses should be ambiguous if the reprojection errors are similar between the two solutions. This could help detecting these cases, maybe something like the Lowe's ratio test for feature matching?

Another possible solutions:

test if estimating the pose from multiple coplanar tags is better
estimate the pose from multiple non coplanar tags

I am wondering if it is possible to add a way to have the two poses in the Dementhon method in the coplanar case. Is it possible to ask François what he thinks about this general issue?

fspindle commented 4 years ago

@chaumett A nice question for you

jmirabel commented 4 years ago

Thanks for your answers. After looking at the code, I was able to understand the ambiguity described above. Thanks for the links, which I could find myself, and the drawings which help clarifying my intuition.

This could help detecting these cases, maybe something like the Lowe's ratio test for feature matching?

I will have a look at it.

With latest version of ViSP, it is possible to get the two poses and the corresponding re-projection errors, see detect() https://visp-doc.inria.fr/doxygen/visp-daily/classvpDetectorAprilTag.html#ac9e8558525f45f97080993f2cb1c2f05 .

I saw that. It is an interesting feature. Sadly, temporal coherence is not sufficient in my case. I usually get the issue when the tag enters the image. I will use the depth image to know which pose is the correct one. It would be a really nice feature if ViSP could also take the depth image and use it to remove this ambiguity.

Do you have any reference of someone using depth information to solve this issue ?

To use the depth information, here is what I have in mind:

fit a plane from the depth points inside the quad,
estimate the depth of the quad corners,
estimate the transform using a method similar to the one in the AprilTag paper,
use the non-linear optimization (like VIRTUAL_VS method) to minimize the re-projection error.

I am pretty sure it shouldn't be hard with ViSP. Is there something to achieve step 1 in ViSP ? Step 2 and 4 are easy. I guess I am on my own for step 3.

fspindle commented 4 years ago

We investigate a bit more and found that the 2 Dementhon poses are close. Getting the 2 poses will not solve this particular case.

On the small tag with id 10, there are 2 families of result:

homography, homography_ortho_iter, Lagrange that give the following poses that are very close:

Pose Lagrange (residual: 0.0004429317625):
-0.0086229909  -0.9741026249  -0.2259418515  0.1469200428
0.9857110623  -0.04629196088  0.1619591184  -0.08703614571
-0.1682240937  -0.2213168105  0.9605829083  0.3718285858
0  0  0  1
Pose homography (residual: 0.0004058392581):
0.03850871573  -0.9733793498  -0.2259418515  0.1472219363
0.98679641  0.001445396755  0.1619591184  -0.08714524239
-0.1573210858  -0.2291954456  0.9605829083  0.372411475
0  0  0  1
Pose homography_ortho_iter (residual: 8.930920357e-06):
-0.04544364432  -0.9978258365  0.04773337713  0.1544783041
0.9650512153  -0.03150559743  0.2601606218  -0.09197177191
-0.2580911215  0.05788780037  0.9643847653  0.3923255539
0  0  0  1

Dementhon that gives a slitly different pose (which is the right one) but doesn't have the best residual:

Pose Dementhon (residual: 9.119016293e-05):
0.280613298  -0.7885794762  0.5471732694  0.1537015906
0.7702732083  -0.1551052703  -0.618564095  -0.09072011684
0.5726564079  0.5950502204  0.5638970417  0.3926504709
0  0  0  1

Then based on the residual you can get the wrong solution (here homography_ortho_iter). VVS will give a solution close to results given by the linear pose estimation method

Pose VVS:
-0.9604132118  -0.2778166153  -0.02060074765  0.002781192013
0.2583075137  -0.8603937562  -0.4393174396  0.06602358738
0.1043249295  -0.4272476011  0.8980956499  0.167854617
0  0  0  1

To conclude, when the tag is small and the image is very noisy there is no easy solution:

you can try to increase the size of the tag
increase the quality of your images
use vpDetectorAprilTag::DEMENTHON_VIRTUAL_VS pose estimation method but maybe for some other images it will not give the right pose
maybe you can also try to keep only Lagrange and Dementhon and remove homography methods. This is easy to test if you built ViSP from source. To this end, simply modify vpDetectorAprilTag.cpp L452 to put:
```
double residual_homography = std::numeric_limits<double>::max(); // pose.computeResidual(cMo_homography);
double residual_homography_ortho_iter = std::numeric_limits<double>::max(); //= pose.computeResidual(cMo_homography_ortho_iter);
```
I would be interested to know if this is working better using vpDetectorAprilTag::BEST_RESIDUAL_VIRTUAL_VS pose estimation method

s-trinh commented 4 years ago

I did some research on this topic before and I found this paper:

@article{Jin2017SensorFF,
  title={Sensor fusion for fiducial tags: Highly robust pose estimation from single frame RGBD},
  author={Pengju Jin and Pyry Matikainen and Siddhartha S. Srinivasa},
  journal={2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2017},
  pages={5770-5776}
}

This is where I found the picture I posted before. Code for their paper is there.

I don't think using reprojection error to choose between Dementhon or Lagrange pose would work. The ambiguity coming from imprecision in the corners localization leads the incorrect pose to be the one with the minimum reprojection error.

There is the following method that retrieves at most 2 solutions for the planar pose estimation problem:

@article{Collins2014InfinitesimalPP,
  title={Infinitesimal Plane-Based Pose Estimation},
  author={Toby Collins and Adrien Bartoli},
  journal={International Journal of Computer Vision},
  year={2014},
  volume={109},
  pages={252-286}
}

I implemented a simple version of the method presented in Jin2017SensorFF. I will submit (after some code cleaning) a first PR draft so that there can be a discussion about which approach to use, if the code is correct or not. Feel free to submit a PR if you come up with something better than what I have implemented.

A quick demo video (bad quality comes from the Youtube compression):

There is a recent paper (not read) for multiple views and multiple markers:

Resolving Marker Pose Ambiguity by Robust Rotation Averaging with Clique Constraints

jmirabel commented 4 years ago

I will submit (after some code cleaning) a first PR draft

Could you ping me when you do so, please ? I could be beta tester if needed.

fspindle commented 4 years ago

Feature introduced in #671

s-trinh commented 4 years ago

Looking again at the picture:

Wondering if anti-aliasing could avoid this issue. Probably Gaussian blur is not a good idea since it would decrease the corners extraction accuracy.

lagadic / visp

April Tags and depth image. #668