Adding Equirectangular Camera Support; correct approach?

Joeppie commented 7 years ago

Hello, for a project involving images that are stored in the equirectangular format, a colleague and I are attempting to run BundleAdjustment on a set of 'landmarks' and 'control points'.

We did get it running,but got incorrect residuals. Investigation proved that the 'Spherical' Camera in OpenMVG is in fact not equirectangular (Instead, spherical might be using rectilinear?)

We've come the conclusion that we need to add support for a "Spherical_Equirectangular" Camera, as the projection formula is different.

Is it correct that we can achieve this by implementing the right functions in a new camera class under openmvg/cameras, adding a correct functor etc. to sfm_data_BA_ceres.cpp , sfm_data_BA_ceres_camera_functor.hpp and the EINTRINSIC in camera_common.hpp?

(I am looking at this changeset as a reference example that added spherical cameras to openMVG.

If I am missing anything important, please let me know. Also, when the code works for the equirectangular, I think it might be interesting to see if it can be integrated into OpenMVG; which then obviates our need for using a Fork. I'd be happy to receive any suggestions.

In any case, the fork, which I am working in is: https://github.com/Joeppie/openMVG

We've already tested the camera's project method, and are getting the correct results there, but I have still to perform the conversion of the code to the ceres functor implementation.

By the way, thanks for developing OpenMVG; it is a pleasure to use.

pmoulon commented 7 years ago

You are right:

In order to add a new camera model you need to add:

your new camera model in src/openMVG/cameras. This class must inherit from IntrinsicBase
add an enum there https://github.com/openMVG/openMVG/blob/master/src/openMVG/cameras/Camera_Common.hpp#L48
Since the cameras library is not linked to ceres functor (in order to be used as a header only library), the functor for the bundleadjustment have to be implemented in https://github.com/openMVG/openMVG/blob/master/src/openMVG/sfm/sfm_data_BA_ceres_camera_functor.hpp

Happy to hear that you liked OpenMVG so far and for sure we would be happy to help on any contributions.

The actual spherical camera is using Spherical coordinates. Equirectangular and Spherical camera coordinates look similar, only the temporary convention used (lat, lon) vs. (theta, phi) is different.

Did the existing Spherical camera model was not working only due to a M_PI shift?

As you see here the spherical camera is not using any rectilinear parametrization.

yuyou commented 7 years ago

Thanks for the work with spherical camera models. I tried both of your models and worked. My question left to solve is to evaluate the quality of the pose estimation. We created a set of synthetic data from Unity3D (I shared three samples here.

Following is the outcome of the robust_essential_spherical. Do you guys see if it is correct?

Left image SIFT count: 5828
Right image SIFT count: 4706
  nfa=-50.6565 inliers=259/266 precisionNormalized=0.820239 precision=0.90567 (iter=0 ,sample=246,239,7,102,41,201,22,238,)
  nfa=-58.0561 inliers=260/266 precisionNormalized=0.670362 precision=0.818757 (iter=2 ,sample=31,68,209,105,175,134,79,196,)
  nfa=-72.0924 inliers=182/266 precisionNormalized=0.00360027 precision=0.0600022 (iter=3 ,sample=260,24,154,167,231,216,265,85,)
  nfa=-75.3695 inliers=168/266 precisionNormalized=0.0011566 precision=0.0340088 (iter=7 ,sample=173,224,79,263,154,96,125,85,)
  nfa=-101.981 inliers=151/266 precisionNormalized=5.66317e-05 precision=0.00752541 (iter=8 ,sample=168,35,123,191,11,71,234,85,)
  nfa=-102.84 inliers=139/266 precisionNormalized=1.60533e-05 precision=0.00400665 (iter=17 ,sample=220,178,54,96,133,147,60,190,)
  nfa=-106.368 inliers=153/266 precisionNormalized=5.1606e-05 precision=0.00718373 (iter=32 ,sample=141,61,100,101,223,168,58,131,)
  nfa=-112.781 inliers=157/266 precisionNormalized=5.02285e-05 precision=0.00708721 (iter=54 ,sample=95,209,140,151,157,13,124,186,)
  nfa=-114.67 inliers=168/266 precisionNormalized=0.000120308 precision=0.0109685 (iter=60 ,sample=123,181,103,222,223,37,157,128,)
  nfa=-117.147 inliers=164/266 precisionNormalized=7.32521e-05 precision=0.00855874 (iter=93 ,sample=221,91,160,79,203,166,12,48,)

 Angular threshold found: 0.49038(Degree)

 #Putatives/#inliers : 266/164

Decompose the essential matrix and keep the best solution (if any)

Bundle Adjustment statistics (approximated RMSE):
 #views: 2
 #poses: 2
 #intrinsics: 1
 #tracks: 157
 #residuals: 628
 Initial RMSE: 76.1868
 Final RMSE: 0.528911
 Time (s): 13.6141

Residual statistics (pixels):

     min: 13.984
     mean: 735.041
     median: 642.575
     max: 2277.98

There is another question that may sound silly. For a spherical camera model (or a omnidirectional camera), how to determine the correct direction of the camera device from the result (i.e. the extrinsics), given the center line of the panorama image would be the direction of the camera device?

pmoulon commented 7 years ago

Hi @yuyou,

The BA is doing its job.
Seems like the Residual statistics (pixels) are too large, I don't know why yet.

how to determine the correct direction of the camera device from the result

Since your camera are on the same plane the camera motion must be near to 0 on the Z axis.

PS: Having such a synthetic rendered dataset would be great for OpenMVG. (Would you be agree to generate more camera position with some GT data?

images + camera center position and orientation (rotation = Identity is ok).

Joeppie commented 7 years ago

Did the existing Spherical camera model was not working only due to a M_PI shift?

Hello Pierre, I'd have to dive into the specifics of the projection method; probably it is a good idea to rewrite the methods to be as similar- in form- as possible, to yours. But we did see a different formula in the bearings vector calculation.

Also, now that I implemented the ceres part, we are getting better bundle adjusment results.

We're still working on getting our (private) project where we use the BundleAdjustment to produce the expected outcome however. After I get good results there, I will hopefully have some time to fix the quality issues of the code in the fork.

yuyou commented 7 years ago

@pmoulon I will check the license terms of the Unity3D resources from my colleagues and share with you more positions for localisation and reconstruction benchmark. The Unity3D resource is named "ArchVizPRO Interior Vol.3" from ArchViz

One question to the sample code "sfm_robust_essential_spherical". The code produced output from "openMVG::sfm::estimate_Rt_fromE", when I gave two test images in one order (e.g. -a 9.png and -b 4.png (see the the samples stored in the dropbox in my previous post). But the reversed order "-a 4.png -b 9.png" returns false. Do you see if it is normal or expected? Also it would be very helpful if the reasons of large residuals could be found out.

pmoulon commented 7 years ago

I will check the license terms of the Unity3D resources from my colleagues and share with you more positions for localisation and reconstruction benchmark.

we can also think about using a free to use dataset.

But the reversed order "-a 4.png -b 9.png" returns false.

I need to find time in order to run some experiment on your images. Certainly due to the fact that the matching is not symmetric and so since the matches and ransac are subject to variation the output is not the same.

pmoulon commented 6 years ago

@yuyou Any new feedback on this?

yuyou commented 6 years ago

@pmoulon Not yet. I am travelling now for 2 weeks. I will spend more time to check the residual errors.

Regarding the Unity3D test data, I have 10 camera ground truth positions of the synthetic living room, rendered in the equirectangular spherical format. I will upload them together with the correct camera IDs to the Google drive, e.g. after next week.

yuyou commented 6 years ago

@pmoulon You can find a small test dataset (8 locations in a living room) with stereo pair and depth per camera location. The left/right eye/camera distance is 0.082 m. The data can be found at Google Drive.

pmoulon commented 6 years ago

@yuyou

I made a fast experiments with your data (and the actual develop_intrinsic branch):

I'm able to find the position of 14 images out of the 16 with SIFT (in NORMAL, HIGH and ULTRA I obtain always 14 images).

SIFT NORMAL:
-------------------------------
-- Structure from Motion (statistics):
-- #Camera calibrated: 14 from 16 input images.
-- #Tracks, #3D points: 3396
-------------------------------
SfM Scene RMSE: 0.840546

I'm able to find all the positions with AKAZE.

AKAZE HIGH:
-------------------------------
-- Structure from Motion (statistics):
-- #Camera calibrated: 16 from 16 input images.
-- #Tracks, #3D points: 37721
-------------------------------
SequentialSfMReconstructionEngine::ComputeResidualsMSE.
    -- #Tracks: 37721
    -- Residual min:    2.28366e-05
    -- Residual median: 0.534876
    -- Residual max:     3.98851
    -- Residual mean:    0.728306

Top View (you guess the green plant on the top left):

cloud_and_poses_AKAZE_HIGH.ply.zip

Here the found camera poses (top view with a different orientation)

Does the camera motion seems coherent with your dataset?

Can you elaborate the file format of the files?

I see that each file contains 6 values per file -> Can you explain me how I could compute the camera position and orientation from your file {rotation convention and camera center or translation?

Here the result with AKAZE_ULTRA colorized_AKAZE_ULTRA.ply.zip

AKAZE ULTRA:
-------------------------------
-- Structure from Motion (statistics):
-- #Camera calibrated: 16 from 16 input images.
-- #Tracks, #3D points: 86969
-------------------------------
SequentialSfMReconstructionEngine::ComputeResidualsMSE.
    -- #Tracks: 86969
    -- Residual min:    7.76301e-06
    -- Residual median: 0.52894
    -- Residual max:     3.97939
    -- Residual mean:    0.709799

yuyou commented 6 years ago

@pmoulon Very good work. I got errors when running the incremental SfM from the "develop_intrinsic" branch. It says "There is no defined intrinsic data in order to compute an essential matrix for the initial pair", when using the new camera mode "7". The "sfm_data.json" contains zero intrinsic entry after the call to openMVG_main_SfMInit_ImageListing.

Back to your question, the first three columns defines the centers of the cameras in metre unit. The last ones are supposed to be the rotations but those values were wrong (as they are delta property values defined in the Unity3D camera settings). So we can not validate the quality of the orientation from those values. But we could benchmark the center positions by calculating the meter-per-pixel (MPP) value, given the fact that there is a constant distance (0.082m) per left/right camera pairs. With the MPP, we can compare the real camera distances in meter against the estimated camera distances in meter (i.e. Length_in_pixel * MPP) .

yuyou commented 6 years ago

@pmoulon can you share with me the sfm_data.bin? So I could make a Python script to calculate the relative errors.

pmoulon commented 6 years ago

@yuyou Here the camera poses info extracted with ConvertSfM_DataFormat sfm_data_poses.json.zip

pmoulon commented 6 years ago

@yuyou Any accuracy statistics that you can share?

yuyou commented 6 years ago

Given the living room size, the error is OK (not perfect yet).

I used one camera as the reference and compared the distance deltas from other cameras. The absolute errors shows:

absolute_translational_error.rmse 0.239814 
absolute_translational_error.mean 0.167325 
absolute_translational_error.median 0.113695 
absolute_translational_error.std 0.171795 
absolute_translational_error.min 0.047216 
absolute_translational_error.max 0.580076

And the errors per meter (distance_delta/dinstance):

absolute_translational_error.rmse 0.101174 
absolute_translational_error.mean 0.060088 
absolute_translational_error.median 0.026111 
absolute_translational_error.std 0.081397 
absolute_translational_error.min 0.025107 
absolute_translational_error.max 0.259418

But I found the pupillary distance (PD) in the benchmark data is not a constant (0.082M). So the estimated meter-per-pixel (MPP) may also have introduced some noises here.

pmoulon commented 6 years ago

Thank you for your feedback. Appreciated.

It's normal that the PD is varying since we did not use any constraint on it ;-)
Did you compare the quality in other softwares (Photoscan, Capturing Reality)?

Given the living room size, the error is OK (not perfect yet).

Notice that the baseline is a bit large for some camera motion. I'm pretty sure that if you double the camera count you will have a smaller registration error.

At least it shows that the implementation is correct and ready to be used! I made tests with hundreds of images (Spherical videos) and got very nice point cloud out of it.

yuyou commented 6 years ago

Good point, I can run PhotoScan and check the output.

Anyway, the added spherical camera model is really useful.

pmoulon commented 6 years ago

@yuyou Any feedback about the comparison with the output from Photoscan?

pmoulon commented 6 years ago

Since, I'm closing the issue since the answer about the step to complete to add a new camera model has been answered.

BTW, @yuyou Fell free to continue the discussion about bench-marking the quality.

openMVG / openMVG

Adding Equirectangular Camera Support; correct approach? #1045