Pose / Pose Matrix - Githubissues

swatbotics / apriltag

Extensions and tweaks to APRIL Robotics Laboratory apriltag C software

https://april.eecs.umich.edu/software/apriltag.html

Other

174 stars 64 forks source link

Pose / Pose Matrix #10

Closed ginsi closed 6 years ago

ginsi commented 6 years ago

I need to compute a Pose Matrix, which - I believe - is in the form M = Transl44 Rot44 (and to know the measurement units for the Transl44); Your wrapper, afaik, gives a 3x3 homography; I am struggling (two days now) to reconstruct the PM without success. I am assuming that the PM (which can be easily decomposed if thought as Transl44Rot44) expresses the position of the relevant tag in a camera-fixed coordinate system, correct? and I read that (a) when working with 4x4 matrices, Homog44 = CM44 PM44 (CM = Camera Matrix) and (b) working with 3x3 we have Homog33 = CM34 ReducedPM43 where the reduction of PM is obtained by deleting the third column (I am not sure how to manage the inversion of the latter formula and the reconstruction of CM44 from CM34 - if at all possible). Somebody can help me please? The story: I am used to the results supplied by the program apriltags_demo as built according to http://people.csail.mit.edu/kaess/apriltags/. Since now I need something that I can easily use in a Python program, I succesfully downloaded and run your Python wrapper (to this same library, I suppose?). The apriltags_demo mentioned above gives results in terms of x,y,z (in meters) and yaw,pitch,roll (in radians), which I should be able to convert to a Pose matrix if it was clear which the reference systems and units are used. I have tried guess something about the ref sys and units used and have written a matlab program which tries to implement the HM <=> PM/CM conversion in order to compare the results of apriltags_demo with the results of the Python wrapper (using as input the main picture from the website https://april.eecs.umich.edu/software/apriltag/; the same six small tags are detected by both the programs, the big one is not(!)). Sorry for the long text, but I am trying to be as clear as I can because I believe that clear answers can be given only to clear questions.

mzucker commented 6 years ago

Is the Python wrapper for OpenCV available to you? If so, it should be possible to call cv2.solvePnP with the right arguments to get the 3D pose; otherwise, there is a C function homography_to_pose in the apriltag library that I can wrap in Python (see line 284 of homography.c in master).

Either way, I can add a pose reconstruction Python example to the repository in the next day or two, please stand by...

ginsi commented 6 years ago

Yes, I have Python-OpenCV (cv2) and I am going to look at solvePnP (and report here the results); I can't use C function because I am very far to be fluent in C; anyway I had found homography_to_pose and tried (just to test my understanding) to translate this in Matlab (I have a very old version of the basic packet). A problem is that I could not find any solid way for compare my findings with reference results; I am trying to reach results comparable with the output of the (C++) program apriltags_demo: it gives x,y,z,yaw,pitch,roll of each tag but I could not find what is its reference system (I find it strange that apparently (from some practical tests) the pitch axis seems to be vertical and yaw horizontal!). I think that if you add that function to the wrapper, this will be very very useful to me and probably to many other, In the meanwhile I thank you very much for your attention and advices. I will keep my eye here :-)

ginsi commented 6 years ago

I have had a look at solvePnP, but this led me to think I must have seriously misunderstood something basilar: so please bear me while I try to sumarize:

my problem, end-to-end, is: I want a software that gets a picture containing one or more april tags and tells me were each tag is, and how it is oriented, with respect to a camera-fixed reference frame; this is something that apriltag_demo gives me (if I give it one focal length in pixels and the tag side in meters); but this has two drawback: (a: most important) being a demo it is not very suitable for direct integration in my software; (b: secondary) the orientation of the tags is expressed in euler angles, while having a rototranslation matrix instead would really be better (I could directly use this to compute some tag to tag transforms that for me is very useful); in my understanding that rototranslation matrix IS the pose matrix. Computing a rototranslation matrix from a translation and Euler angles is easy, if one knows which particular set of Euler angles have been selected
I found your python wrapper of the same library (it is the same, isn't it?) which seemed perfect for me, but...
...I realized that your wrapper does not give a rototranslation/pose; it gives an homography (3x3)
I read that the homography is related to the pose matrix via the camera matrix; if the homography was 4x4, the equation H = CM PM could be easily reversed by finding the (left) inverse of the CM: CMinv H = CMinvCM PM => H = CMinv * PM; being the homography from the wrapper only 3x3, this probably means that we should either work entirely in 3x3 (is it possible?) or go to rectangular CM and PM and use a CM left inverse (is it possible?)
I was surpised that the solvePnP that you suggested to use goes back requiring "image points" and "object points" as calling parameters, that in my view were in an already solved part of the problem; this in particular led me to believe I am probably misunderstanding some very basilar things!
your other suggestion, that to me appears more in the right direction for my needs, is the homography_to_pose function; I am not able to understand all the what and way of the maths behind, but I think that, if necessary, I may be able to rewrite it in python (with numpy? OpenCV?) with a little help; in particular I do not understand what the line 335 does; as for line 332, I think it does an SVD decomposition, for which I could use cv2.SVDecomp(src) → w,u,vt, but what is (line 335) R = matd_op("M*M'", svd.U, svd.V)? perhaps the product between U and adjoint(V) (where adjoint == traspose_conj)?

mzucker commented 6 years ago

Hold on a day or two and I’ll make an end-to-end demo.

On Jul 8, 2018, at 1:37 PM, ginsi notifications@github.com wrote:

I have had a look at solvePnP, but this led me to think I must have seriously misunderstood something basilar: so please bear me while I try to sumarize:

my problem, end-to-end, is: I want a software that gets a picture containing one or more april tags and tells me were each tag is, and how it is oriented, with respect to a camera-fixed reference frame; this is something that apriltag_demo gives me (if I give it one focal length in pixels and the tag side in meters); but this has two drawback: (a: most important) being a demo it is not very suitable for direct integration in my software; (b: secondary) the orientation of the tags is expressed in euler angles, while having a rototranslation matrix instead would really be better (I could directly use this to compute some tag to tag transforms that for me is very useful); in my understanding that rototranslation matrix IS the pose matrix. Computing a rototranslation matrix from a translation and Euler angles is easy, if one knows which particular set of Euler angles have been selected I found your python wrapper of the same library (it is the same, isn't it?) which seemed perfect for me, but... ...I realized that your wrapper does not give a rototranslation/pose; it gives an homography (3x3) I read that the homography is related to the pose matrix via the camera matrix; if the homography was 4x4, the equation H = CM PM could be easily reversed by finding the (left) inverse of the CM: CMinv H = CMinvCM PM => H = CMinv PM; being the homography from the wrapper only 3x3, this probably means that we should either work entirely in 3x3 (is it possible?) or go to rectangular CM and PM and use a CM left inverse (is it possible?) I was surpised that the solvePnP that you suggested to use goes back requiring "image points" and "object points" as calling parameters, that in my view were in an already solved part of the problem; this in particular led me to believe I am probably misunderstanding some very basilar things! your other suggestion, that to me appears more in the right direction for my needs, is the homography_to_pose function; I am not able to understand all the what and way of the maths behind, but I think that, if necessary, I may be able to rewrite it in python (with numpy? OpenCV?) with a little help; in particular I do not understand what the line 335 does; as for line 332, I think it does an SVD decomposition, for which I could use cv2.SVDecomp(src) → w,u,vt, but what is (line 335) R = matd_op("MM'", svd.U, svd.V)? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ginsi commented 6 years ago

Wonderful!

mzucker commented 6 years ago

Hi, sorry to say that between getting back up to speed on the math and day job being a little more demanding than expected, I'm not as far along as I had hoped. Still expect to have something in the next few days, please stand by.

ginsi commented 6 years ago

No problem, of course... in the mean time I am trying to improve my knowledge on these things and the maths behind...

mzucker commented 6 years ago

Ok, commit 1d5d313 should have all the the functionality you want. Once you get a detection, you can call Detector.detection_pose() which returns a 4x4 rigid transformation matrix as well as some information about goodness of fit that you can discard if you don't care about it.

The apriltag.py demo code can demonstrate this, see README.md for details about how to enable pose detection from the command line. You'll need to know some basic parameters of your camera, as well as the physical dimensions of the tag.

One known problem regarding fiducial markers like apriltag is that detecting orientations from a single tag in an image can be really noisy if there are no strong perspective cues (translation is not as poorly affected). This is the computer vision version of the Necker cube ambiguity. There's no magic to avoiding this, but combining information from three or more tags can help (essentially you throw away the orientations of each tag, and extract the orientation from the combined tag centers).

Please let me know if the new demo code works for you!

ginsi commented 6 years ago

I'm pretty sure this is all I needed and much more (but also very useful). Unfortunately (for me) my laptop has just died and I need a couple of days to recover before being capable to test. I will report as soon as I can. I the mean time I thank you for all this effort and for the added explanations.

ginsi commented 6 years ago

Recovery of laptop took a bit more than expected... I have run the new version of apriltag.py as suggested at the end of the README.md and seen that now each Detection includes a Pose, which is exactly what I need. I think I succeded also in better understanding the involved units for the translation part (afaiu the first three elements in the 4th column of the pose are in the same unit used to express the side length of the black frames; saving this, the software does not really care what they are, can you confirm, please? (I guess that the example uses meters, so .127 m = 5")). I imagine this is not the right place to ask about the meaning of Goodness, Decision Margin, Init Error, Final Error, but may be you could redirect me somewhere, for this?). I thank you again for the assistance, and in general for making the wrapper available to the community. I am not sure if the "netiquette" of issues foresees that now I close mine, therefore I will not do it immediately, can you please suggest me if I should? Cheers...

mzucker commented 6 years ago

Not 100% accurate to say that a Detection includes a pose, but rather you can compute a pose from a Detection.

As I began to type this reply up, I realized I had a bug in 1d5d313 that made the scales in the translation vectors off by a factor of two, which I then fixed in aaa47b4 (tested by calibrating my new iPhone camera and photographing a tag using a meter stick to separate the two).

Now that that is fixed, yes you are exactly correct about the units on pose. I believe the tags used for the mapping example were 5" = .127m, so now that I have fixed my bug, the units on the translation vectors (topmost three elements of pose matrix on right hand side) should be correctly expressed in meters relative to the camera frame (with X right, Y down, and Z pointing out the lens, origin at the camera center).

Goodness and Decision Margin are from the part of the code that I didn't write, so I will refer you to the original apriltag paper for details, but in short I believe "goodness" refers to the pixel-wise intensity contrast around the perimeter of the quad, whereas "decision margin" refers to the contrast within the quad itself. It's not clear to me that goodness is used much in the current code version (i.e. it appears to be zero all the time). In general, higher decision margin is better (i.e. means more contrast within the tag). Those two quantities have nothing to do with pose detection other than their correlation to the overall quality of the photo (better photos = pose detection).

Init Error and Final Error refer to the reprojection error associated with a given tag. Basically, there is a closed-form solution to use linear algebra to estimate pose from point correspondences alone, but it yields a biased estimate that is not the most accurate, especially if the locations of the quad corners are subject to lots of noise. You can then use an iterative optimization technique (I chose Levenberg-Marquardt) to refine this estimate. These reprojection errors are measured in pixels squared (i.e. they are the sum of squared distances, measured in pixel coordinates). In general we want these errors to be very low (1-2 pixels for small tags in the image, tens of pixels for big tags) relative to the tag size. We always expect the final error to be lower than the initial (i.e., the refinement process should do no harm).

Anecdotally, it seems like my new quad detection algorithm (run demos using the -c option) generally has better (lower) reprojection error than the old one, which is nice to see.

I'm going to leave this issue open for a little while longer and if you have any questions or problems in the next few weeks, just keep replying here. If you let me know everything is working great, or if I don't hear from you for a few weeks, I'll close the issue at a later date.

Glad to help!

ginsi commented 6 years ago

Hi! After some fighting with Xubuntu installation on my new laptop (apparently they destroyed my capacity to avail of the DNS service surrogated by my router - so important on the LAN, but this is another story); I could eventually do some more testing about apriltags library and wrapper. I am really impressed; the accuracy of distance measurements in the world seems to better than 1 mm on 1m (I am not able to make better measurements); the inter-tags distance is also very good. I have not yet studied enough to know what accuracy figures can be expected, and how. I am working with a 5 Mpix Pi camera on raspberry, after having calibrated it to get the camera matrix and the distorsion parameters; so I feed the rectified image and the new matrix to the apriltag software); for the time being I have used apriltag.py as it is and not yet called the wrapper from my software, but a look at the apriltag.py code make me very very confident that everything will go like a charm. Of course you are right saying that detection does not include pose! my first look at the code had been too fast! I thank you so much also for the clarifications you have been so kind to give me (about errors etc). Cheers

mzucker commented 6 years ago

Great, seems like you know your business with calibration and rectification, glad you're getting good results.

I was planning on adding pose information some day, your feature request just prompted me to go ahead and finally do it.

I'm closing this issue now – best of luck using the code.