Closed antithing closed 7 years ago
A new, better multi-calibration has fixed this. :)
Can you please share the calibration package that you used? I am using ImprovedOcamCalib but it won't initialize....
Hi, I also used ImprovedOcamCalib. I found that the extrinsic calibration is also VERY important. If that is slightly out the system will have a hard time.
If you get no map points at all, and it is stuck on INITIALIZING, then it is likely the intrinsics. If you get map points in one camera, but tracking is lost very easily, it may be the extrinsics.
I am trying to calibrate a Gear 360 (only one 180 fisheye lens right now) and am using ImprovedOcamCalib as well. The calibration parameters are as follows:
%YAML:1.0
Camera.Iw: 640 Camera.Ih: 640
Camera.nrpol: 6 Camera.nrinvpol: 3
Camera.a0: -172.296104925459 Camera.a1: 0 Camera.a2: 0.00160105662555009 Camera.a3: -6.34303796841769e-06 Camera.a4: 6.56202451752210e-08 Camera.a5: -1.83867951781103e-10
Camera.pol0: 17.2540014422096 Camera.pol1: 80.6878056249852 Camera.pol2: 298.076583841793 Camera.pol3: 336.001425859136
Camera.c: 1.001231302713875 Camera.d: -4.098360811880249e-04 Camera.e: -3.266710888569786e-04
Camera.u0: 3.222578566035110e+02 Camera.v0: 3.181511158806330e+02
Camera.mirrorMask: 1
The forward poly is extracted from "ss", and the backward poly is extracted from "pol"...I am using 15 grayscale images, downscaled to 640x640 for calibration. The calibration checkerboard is 7x5 (29mm).
The strange thing is that the tracking can initialize with the sample calibration file that comes in the repository!!!
Any idea what I might be missing here? Just to emphasize: this is only for one of the lenses... I am trying to use a single camera for tracking... I guess there is no point in using two lenses that are 180 degree back to back (<-->) with no overlap...
@ardalanK -- I noticed that you have "Camera.nrinvpol: 3" but you have 4 numbers in your backward polynomial list. Could that be part of your problem?
@FreedomRings Thanks for your response. I could find the issue. You are right about this one, but generally, the inv.pol numbers reported in the MATLAB code should be entered the other way around. Meaning if you have camera.pol0 to 10, the associated numbers in the MATLAB code output is pol10 to 0.
I also have a problem with the external multi-camera calibration. I am trying to get MCPTAM to work but not sure how to import images from a folder rather than getting the live feed from a camera.... Is there any chance that you might have a sample executable for that?
@antithing -- I am having the exact same problem as you originally posted - and I have a lot of "jitter" in the features it identifies and displays in the windows. I used ImprovedOcamCalib as well to get to this point. I do not actually understand your answer between intrinsic and extrinsic, how to get those numbers and where to set them. I think I am missing something elemental. To be honest, all I have really done is to take the example and exchange camera frames for his video frames. I suspect I need to do some kind of system initialization for my room or change the vocabulary in some way.
I would really and truly appreciate any light you can shed on the subject or any examples you can steer me toward - I have been struggling with this project since November (I am just a toy maker and not a mathematician or computer vision expert - but I am a very quick study).
@ardalanK -- You are very welcome, sorry I did not see this sooner.
I used the MATLAB as well (wow, that is expensive) and put the numbers into the camera calibration YAML file from top to bottom (in the same order as MATLAB reported them in the run console). I am wondering if that is my problem!!
As far as MCPTAM is concerned, nope. I have not even attempted that. When I read his Thesis it seemed to indicate that MultiCol automatically calibrated the multi-camera setup based on the features that were discovered in the overlap visible between cameras, but I could easily have that wrong. This also might be a place where I am in error. I keep getting confused because several sites seem to use different terminology and descriptions (and shortcut language).
@antithing In Matlab code (ImprovedOcamCalib) you go to ocam_model and then pol... now, depending on the degree of your polynomial nonlinear optimization, you might have different numbers and values in "pol".
let's say there are 9 "pol" values there... listed in column 1 to 9.... You naturally would think that Camera.pol0 would correspond to the first number in column 1, BUT, that's not the case... Camera.pol0 corresponds to column "9".... so the numbers in ocam_model --> pol and the Camera.pol? correspond in reverse order...
Take a look at this link for further information on other calibration info.... https://sites.google.com/site/scarabotix/ocamcalib-toolbox
Scroll down all the way to 16. Workspace Variables.
Cheers, Ardalan
@ardalanK -- WOW! What a great link! I will obviously be rerunning my calibration! Thank you!
I cannot wait to pour over this link. If you guys get this tracking problem solved and start getting accurate locations (20cm does not work for me either) please let me know. I am also eager to know if MCPTAM is going to be required. If it is and I create an executable for it I will ping you.
Did you have to create your own vocabulary file? Or did you use his?
@FreedomRings You won't need to create your own vocabulary... The sample one should do the trick!
@ardalanK - I recalibrated and reversed the order of the "pol" numbers and I went from having lots of little blue squares (primarily in one camera or the other but not both) and "Tracking" (that would drop if I moved at more than a snail's pace like the original post) to drawing no squares but a lot of seemingly random green lines that looked like a children's game of "Pick Up Sticks". I tried reversing the other one too (Camera.a0 thru Camera.a4) since it was column based as well and it got even worse.
¯\_(ツ)_/¯
Looking at @antithing post above, I don't know if I fall into his "If you get no map points at all, and it is stuck on INITIALIZING, then it is likely the intrinsics" statement with the newly reversed order - OR - his comment "If you get map points in one camera, but tracking is lost very easily, it may be the extrinsics" which describes what I was getting before I reversed the pol numbers.
As far as that goes, I am not sure what to do about either one as I am unclear on what is an intrinsic and what is an extrinsic and how to modify what to make an adjustment that works.
I am so discouraged that I nearly chucked this whole project after working on it for almost 6 months without a good track yet. It appears that the only one of us that has made this work well is @urbste Steffen and he has apparently moved on and is neither with the university anymore nor replying here when we have problems.
This just seems like such a great solution and a wonderful idea, I hate to abandon it. I suspect that the problem is the multi-camera calibration and in his instructions he has it listed as a Lafida TODO.
ANY suggestions are appreciated.
I am starting to think that this is no doable. Looking at the link http://www.ipf.kit.edu/lafida.php I think I am seeing that he calibrates using an outside-in laser system and then uses that to evaluate where the 3 camera system finds itself inside that laser scanned world. Am I wrong?
Hey guys. Sorry for my late reply. But I did not have access to a computer in the last couple of month. All this is research code and is supposed to help others to get started with their own work/project on multi-camera SLAM. It was never intended to be used as an out-of-the box solution!! To make it really work with your own camera rig, you will need to get a deeper understanding of the underlying theoretical aspects (like extrinsic/intrinsic) calibrations and (more importantly) what the code really does and where you will have to modify it in order to get it to work with your own rig.
@FreedomRings I am really sorry that you struggle with your project! To check if all the extrinsics and instrinsics are correct you might want to create a little C++ project. Then you isolate all the functions from MultiCol-SLAM that import your calibration files and do the world2cam and cam2world projections. Then you could use one of your checkerboard images, estimate the exterior orientation (R,t using a PnP algorithm from OpenGV and the camera rays v) w.r.t the checkerboard and reproject the checkerboard points (X) using m = world2cam(R*X+t). Similarily you can check the extrinsics by taking images from an object whose 3D coordinates you know and that is observable from both cameras. And you can do the reprojection trick with both cameras.
Concerning the calibration: We did not use an outside-in laser system to calibrate anything. We used a Motion-Capture (MoCap) camera system to estimate the path of the systems trajectory. We calibrated the rig using the black circles on the wall(http://www.ipf.kit.edu/img/ProjektCV2015/MCS_cal_reprojected.PNG). Since we know their 3D locations in a world coordinate frame (measured with the MoCap system), we were able to estimate each cameras pose (estimated using PnP) in the room for each timestep. Then we took like 15 image sets (15*3 images) and for each timestep t you get an estimate for the relative orientation between the cameras. Finally, we ran bundle adjustment to minimize the reprojection errors for all timesteps in all cameras and thus optimizing the extrinsics. This is quite a custom calibration which is not easily reproducible and thus I can't provide you with any code. The MCPTAM calibration is completely automatic, thats why I pointed you to that toolbox. The difference is that the rotation is parametrized as a Rodrigues vector and I chose the Cayley parametrization. So you have to take care of that. In addition, I don't know if the author saves the backward or forward transformation.
So you see, there is a lot to check and test (step by step and verify that is works) before you can simply press the launch button ;-)
Concercing the vocabulary: I created a smaller vocabulary in order to speed up the start of the SLAM. The ORB-SLAM vocabulary is quite large and takes some time to load. To get good loop closing, however, you will need a bigger vocabulary (either train your own or use the big one from ORB-SLAM)!
Concerning the Gear 360: looks neat. The baseline between the cameras however, seems to be super small. So this could be pretty counterproductive for reconstruction and tracking stability.
Cheers Steffen
@urbste Thanks so much for your response!
@urbste Thank you so much for your reply! And for focusing on my specific problems! There is a lot to digest here. I love having a new path to follow, it is very frustrating hitting a wall and not knowing what to try next.
If I may, can I ask your input on two things:
When I use your Improved calibration using MATLAB with my 36" x 48" checkerboard on my kitchen floor it finds all of the corners, reprojects on the images perfectly and the Show Extrinsics displays each of the frames in the orientations and depths that the images were taken - which has me encouraged. Your logic in your papers (to the extent I understand it, LOL) and in execution strikes me as the very highest in quality - but I am concerned that I might be trying to use the wrong toolset for my knowledge level. For instance, I have no idea what it means to convert a Rodrigues vector to Cayley parametrization and it adds yet another failure point to my project. I am a proficient programmer but I am already making guesses on converting your pose matrix to the pose matrix I need for the software MultiCol feeds and neither seems documented in ways I actually understand. As a non-mathematician and non-computervision expert - do I have a shot at doing this without hiring it out?
Is training your own vocabulary the same as the training that I see when the point cloud is created and the key frames are saved? What exactly is the difference (if any)? I see people using this technique in automobiles driving down the highway and I do not understand how we can create a large enough vocabulary to cover all possibilities in the great outdoors (beyond the confines of a courtyard) if the vocabulary must be created for the confines that the cameras see. See what I mean about missing a few basic understandings? Unfortunately Google only helps if I understand the cryptic mathematics.
Perhaps I need to start a SLAM support group. LOL
Thank you for putting up with a newbie.
@FreedomRings: It might be helpful, to get an idea of the theoretical aspects that are involved in a SLAM system in order to get the research code I published here running. ;-) To be honest, you won't have much fun using this code if you don't. Some resources that might be helpful:
The techniques that are part of state-of-the-art SLAM systems are the result of decades of research and are pretty evolved.
About the rotations. A rotation in 3D can be parametrized in different ways, that have advantages and disadvantages. The 3x3 rotation matrix is a good way to get an idea of the rotation but it is a bad way if you want to use it in an optimization. A rotation R in 3D has 3 degrees of freedom (DoF), but a 3x3 matrix has 9 elements, so it is "over-parametrized". In addition a rotation matrix has to fulfill some properties, such as being orthogonal (e.g. transpose(R)=inv(R)). Lets say you stack the 9 elements of R into a vector x. If you now start to optimize the rotation w.r.t some reprojection error (residuals), you will get small values dx that you add to your initial guess x(t+1)=x(t)+dx. Now the thing is, that if you reshape the vector x into a rotation matrix R, the properties might not be fulfilled anymore, so you have to keep them somehow during your optimization, which is difficult. So in order to avoid these issues, it is better to find a minimal representation of your rotation matrix, i.e. a representation that matches the degree of freedom, i.e. 3 values instead of 9. Such minimal representations are Rodrigues, Cayley, ... and there is a mapping (equations) to map between R and these minimal representations and back.
Training the vocabulary and running the SLAM are two different things. Creating a vocabulary means extracting a large number of keypoints from a diverse set of images. Then a codebook (vocabulary) is learned by clustering the keypoints into "words" like in a real vocabulary (like a book ;-)). Then when you get a new image and extract features you can run these features down your vocabulary and create a histogram of "word occurences" in your image. This histogram can then be used, e.g. to find loop closures and so on. If your vocabulary is big enough, it will work in a variety of different scenes.
Cheers Steffen
@urbste Thanks for all the efforts and explanation. And your paper is well-written. One question I still have is the order of the cameras. Does them (the Cayley parameters) in the file "MultiCamSys_Calibration.yaml" matter? I have a setup with 4 cameras mounted in a horizontal row with different rotation angles respectively. But the viewer shows the device direction in the opposite direction (see the attached screen shot). That might be the reason to show points in one camera only?
My current hunch is that the coordinate system (x/y/z) in my system is different. I rotated 180 degree in X-axis to make my previous rotation matrix to look outside, rather than inside.
@antithing Hey man,Could you please tell me which multi-camera calibration method you use now? I used 4 wide-angle(FOV 160) camera to do slam,but easy to loss. And I use https://sites.google.com/site/prclibo/toolbox/doc to calibrate my system as you did before. It really troubles me!!! Thanks for any reply!!!
Hi, sorry to bother you yet again, I am just having one last issue.
When I start the system, it moves to
TRACKING
, but I only see keypoints in one frame (and map points in only one section of the point cloud/map, in front of one camera). If I very slowly rotate the cameras, more points are added, and the map is built, but I must be very slow and careful or tracking is lost.In the examples, I see that all frames are filled with roughly an equal number of keypoints, and the map is built from every camera as soon as initialisation occurs. If you have a minute, do you have any thoughts on why this might be not happening for me?
(Once the 360 degree map is built, tracking is great, but getting to that point is tricky!)
Thank you once again for your awesome work and code!