Error in Essential Matrix

saktx commented 6 years ago

Hello. I am using mexopencv based following code for estimating the relative pose of camera (image 2 with respect to image 1). The estimated relative orientation of camera is closer to the ground-truth orientation but its relative location is different than the ground-truth location. On the other hand, the results of pose estimation, with the help of just MATLAB functions, are quite accurate. Why is it so? Please help me identify the possible problem.

amroamroamro commented 6 years ago

can you share a complete reproducible example, with inputs images, ground-truth pose, also the comparison code using MATLAB's functions.

saktx commented 6 years ago

The image1 and image2 are attached and the values of ground-truth pose are given as under:

GroundTruth_Orientation = [0.999976882854229 -0.000040439012458 0.006799420697770; 0.000000523586671 0.999982769552496 0.005870314969860; -0.006799540930449 -0.005870175704847 0.999959652826218];

GroundTruth_Location = [-0.000043000000000 0.000008000000000 0.217041000000000];

amroamroamro commented 6 years ago

what about the MATLAB code that gives more accurate results?

saktx commented 6 years ago

I have noticed that the relative-location estimated by both the codes contains error. Why is it so? Results should be close to the ground-truth pose.

Following is the MATLAB's code:

clear all
clc

K = [615 0 320; 0 615 240; 0 0 1];
cameraParams = cameraParameters('IntrinsicMatrix', K);

% Read first image.
Irgb = imread('image1.jpg');
% Convert to gray scale and undistort.
prevI = undistortImage(rgb2gray(Irgb), cameraParams);
prevPoints = detectSURFFeatures(prevI, 'MetricThreshold', 500);         % Detect features.

% Select a subset of features, uniformly distributed throughout the image.
numPoints = 150;
prevPoints = selectUniform(prevPoints, numPoints, size(prevI));

% Extract features. Using 'Upright' features improves matching quality if
% the camera motion involves little or no in-plane rotation.
prevFeatures = extractFeatures(prevI, prevPoints, 'Upright', true);

% Read image 2.
Irgb = imread('image2.jpg');
% Convert to gray scale and undistort.
I = undistortImage(rgb2gray(Irgb), cameraParams);

% Match features between the previous and the current image.
[currPoints, currFeatures, indexPairs] = helperDetectAndMatchFeatures(...
    prevFeatures, I);

% Estimate the pose of the current view relative to the previous view.
[relative_orient, relative_location, inlierIdx] = helperEstimateRelativePose(...
    prevPoints(indexPairs(:,1)), currPoints(indexPairs(:,2)), cameraParams);

display(relative_orient);
display(relative_location);

amroamroamro commented 6 years ago

Did you check the meaning of the ground-truth variables? cv.recoverPose returns relative pose (image 2 relative to image 1). Are you sure the ground-truth you have is the same, perhaps it's relative to some other reference frame..

I don't see any obvious mistakes in the code. Yet this is an estimation, you are bound to get errors.. Also the estimation depends on having good matches. You can debug by visualizing the matched keypoints (e.g cv.drawMatches).

amroamroamro commented 6 years ago

I think I know what you missed, notice what the docs say:

cv::recoverPose

This function decomposes an essential matrix using decomposeEssentialMat
cv::decomposeEssentialMat

By decomposing E, you can only get the direction of the translation, so the function returns unit t.

which means that the recovered translation vector is only up to a scale factor:

>> norm(relativeLoc)
ans =
   1.000000000000000

fwiw, the same applies with the equivalent MATLAB function relativeCameraPose:

The function computes the camera location up to scale and returns relativeLocation as a unit vector.

saktx commented 6 years ago

So it means that the relative translation between image-2 and image-1 will be taken as 1 (let scale_factor = 1). And further relative translations (say, image-3 w.r.t image-2) will be scaled according to the scale_factor?

amroamroamro commented 6 years ago

It means the scale (magnitude) of the translation vector is arbitrary, and they just chose to return a unit vector with norm=1.

In other words, the camera location is a*relativeLoc for some appropriate scale factor a (free variable).

So if you want to compare the answer against the ground-truth location, scale it by the magnitude of the ground-truth location, i.e:

relativeLoc = relativeLoc * norm(GroundTruth_Location);

For reference, here is my test code for both versions (one using OpenCV functions, the other MATLAB functions). I'm getting similar results in both.

example_pose_opencv.m

% ground-truth pose
RR = [0.999976882854229 -0.000040439012458 0.006799420697770; 0.000000523586671 0.999982769552496 0.005870314969860; -0.006799540930449 -0.005870175704847 0.999959652826218];
TT = [-0.000043000000000 0.000008000000000 0.217041000000000];

% camera matrix
K = [615 0 320; 0 615 240; 0 0 1];

% images
dname = fullfile(toolboxdir('vision'), 'visiondata', 'NewTsukuba');
im1 = cv.imread(fullfile(dname, 'image0001.jpg'), 'Grayscale',true);
im2 = cv.imread(fullfile(dname, 'image0002.jpg'), 'Grayscale',true);

im1 = cv.undistort(im1, K, zeros(1,4));
im2 = cv.undistort(im2, K, zeros(1,4));

% detect features
detector = cv.SURF('Upright',true);
[k1,d1] = detector.detectAndCompute(im1);
[k2,d2] = detector.detectAndCompute(im2);

% match keypoints
matcher = cv.DescriptorMatcher('BruteForce');
matches = matcher.knnMatch(d2, d1, 2);
idx = cellfun(@(m) (numel(m) == 2) && (m(1).distance < 0.7 * m(2).distance), matches);
matches = cellfun(@(m) m(1), matches(idx));

pts1 = cat(1, k1([matches.trainIdx] + 1).pt);
pts2 = cat(1, k2([matches.queryIdx] + 1).pt);

% estimate E
[E, mask] = cv.findEssentialMat(pts1, pts2, 'CameraMatrix',K, 'Method','Ransac');
mask = logical(mask);
%assert(nnz(mask) / numel(mask) > 0.3, 'not enough inliers');

% recover pose
[R, T, ~, mask2] = cv.recoverPose(E, pts1, pts2, 'CameraMatrix',K, 'Mask',mask);
mask2 = logical(mask2);
%assert(nnz(mask2) / numel(mask2) > 0.8, 'E is likely incorrect');

% compare against ground-truth
T = T' * norm(TT);  % T is only up-to-scale
R, RR
T, TT

% plot matches (similar to showMatchedFeatures)
p1 = pts1(mask,:);
p2 = pts2(mask,:);
out = cat(3, im1, im2, im2); % red-cyan falsecolor
lineX = [p1(:,1), p2(:,1)]'; lineX(3,:) = NaN;
lineY = [p1(:,2), p2(:,2)]'; lineY(3,:) = NaN;
imshow(out), hold on
plot(p1(:,1), p1(:,2), 'ro', ...
    p2(:,1), p2(:,2), 'g+', ...
    lineX(:), lineY(:), 'y-')
hold off

output:

R =
   0.999969674461406   0.001410531114951   0.007659018182794
  -0.001365529322959   0.999981793735479  -0.005877714457321
  -0.007667169439810   0.005867077598551   0.999953394870598
RR =
   0.999976882854229  -0.000040439012458   0.006799420697770
   0.000000523586671   0.999982769552496   0.005870314969860
  -0.006799540930449  -0.005870175704847   0.999959652826218
T =
  -0.052418529691960  -0.013343348955330  -0.210192888503844
TT =
  -0.000043000000000   0.000008000000000   0.217041000000000

example_pose_matlab.m

% ground-truth pose
RR = [0.999976882854229 -0.000040439012458 0.006799420697770; 0.000000523586671 0.999982769552496 0.005870314969860; -0.006799540930449 -0.005870175704847 0.999959652826218];
TT = [-0.000043000000000 0.000008000000000 0.217041000000000];

% camera matrix
K = [615 0 320; 0 615 240; 0 0 1];

% images
dname = fullfile(toolboxdir('vision'), 'visiondata', 'NewTsukuba');
im1 = rgb2gray(imread(fullfile(dname, 'image0001.jpg')));
im2 = rgb2gray(imread(fullfile(dname, 'image0002.jpg')));

camParams = cameraParameters('IntrinsicMatrix', K');
im1 = undistortImage(im1, camParams);
im2 = undistortImage(im2, camParams);

% detect features
k1 = detectSURFFeatures(im1, 'MetricThreshold',500);
k2 = detectSURFFeatures(im2, 'MetricThreshold',500);
%k1 = selectUniform(k1, 150, size(im1));
%k2 = selectUniform(k2, 150, size(im2));
d1 = extractFeatures(im1, k1, 'Upright',true);
d2 = extractFeatures(im2, k2, 'Upright',true);

% match keypoints
matches = matchFeatures(d1, d2, 'Unique',true);

pts1 = k1(matches(:,1)).Location;
pts2 = k2(matches(:,2)).Location;

% estimate E
[E, mask] = estimateEssentialMatrix(pts1, pts2, camParams);
%assert(nnz(mask) / numel(mask) > 0.3, 'not enough inliers');

% recover pose
[R, T, frac] = relativeCameraPose(E, camParams, pts1(mask,:), pts2(mask,:));
%assert(frac > 0.8, 'E is likely incorrect');

% compare against ground-truth
T = T * norm(TT);  % T is only up-to-scale
R, RR
T, TT

% plot matches
p1 = pts1(mask,:);
p2 = pts2(mask,:);
showMatchedFeatures(im1, im2, p1, p2);

output:

R =
   0.999974645912623  -0.000351104852467   0.007112261054472
   0.000396139969458   0.999979876624346  -0.006331620604692
  -0.007109894869052   0.006334277523106   0.999954662133845
RR =
   0.999976882854229  -0.000040439012458   0.006799420697770
   0.000000523586671   0.999982769552496   0.005870314969860
  -0.006799540930449  -0.005870175704847   0.999959652826218
T =
   0.029195774529640  -0.074651919124305   0.201696542644322
TT =
  -0.000043000000000   0.000008000000000   0.217041000000000

saktx commented 6 years ago

Thank you, now I understand... So for visual odometry, we will have to normalize the relative location for every image pair? Or just for the first image pair? Later on bundle adjustment can be applied...

saktx commented 6 years ago

What is the difference between "mask" and "mask2" in the outputs of following functions?

[E, mask] = cv.findEssentialMat();
[R, T, ~, mask2] = cv.recoverPose();

amroamroamro commented 6 years ago

Seeing that your code is based on this MATLAB example

https://www.mathworks.com/help/vision/examples/monocular-visual-odometry.html

did you look at how they do it?

There's this part which is of interest:

The location of the second view relative to the first view can only be recovered up to an unknown scale factor. Compute the scale factor from the ground truth using helperNormalizeViewSet, simulating an external sensor, which would be used in a typical monocular visual odometry system.
vSet = helperNormalizeViewSet(vSet, groundTruthPoses);

Looking at helperNormalizeViewSet helper function, you can see this:

% Move the first camera to the origin.
locations = cat(1, camPoses.Location{:});
locations = locations - locations(1, :);

locationsGT  = cat(1, groundTruth.Location{1:height(camPoses)});
magnitudes   = sqrt(sum(locations.^2, 2));
magnitudesGT = sqrt(sum(locationsGT.^2, 2));
scaleFactor = median(magnitudesGT(2:end) ./ magnitudes(2:end));

% Scale the locations
locations = locations .* scaleFactor;

Looks like they're taking the median of the norms of ground-truth locations (up to the current frame), and using that as scaling factor.

amroamroamro commented 6 years ago

Refer to the docs:

[E, mask] = cv.findEssentialMat();

mask indicates indices of inlier points from the RANSAC method when estimating the E matrix.

[R, T, ~, mask2] = cv.recoverPose();

mask2 here indicates indices of inliers points which pass the cheirality check when recovering the pose.

Again the (commented) asserts involving those maks in my above code came from that same MATLAB example. See the code for the helper function helperEstimateRelativePose.m.

    % Estimate the essential matrix.    
    [E, inlierIdx] = estimateEssentialMatrix(matchedPoints1, matchedPoints2,...
        cameraParams);

    % Make sure we get enough inliers
    if sum(inlierIdx) / numel(inlierIdx) < .3
        continue;
    end

    [orientation, location, validPointFraction] = ...
        relativeCameraPose(E, cameraParams, inlierPoints1(1:2:end, :),...
        inlierPoints2(1:2:end, :));

    % validPointFraction is the fraction of inlier points that project in
    % front of both cameras. If the this fraction is too small, then the
    % fundamental matrix is likely to be incorrect.
    if validPointFraction > .8
       return;
    end

amroamroamro commented 6 years ago

Let's keep the discussion on-topic, about reporting possible mexopencv issues and bugs.

Usage questions about opencv are better asked on OpenCV Q&A or Stack Overflow.

alina33 commented 6 years ago

I checked ur code but can not understand what is wrong with it. I guess you have not correctly developed similar mexopencv code for these 2 matlab functions.

[worldPoints, imagePoints] = helperFind3Dto2DCorrespondences(vSet,...
    cameraParams, indexPairs, currPoints);

 and

[orient, loc] = estimateWorldCameraPose(imagePoints, worldPoints, ...
    cameraParams, 'Confidence', 99.99, 'MaxReprojectionError', 0.8);

visual odometry is critical topic. can somebody pls explain what is the prblm with this mex code..?

saktx commented 6 years ago

I have posted questions related to MEXOPENCV at OPENCV's forum, but people there are asking me to put questions related to mexopencv at this github forum because mexopencv is a wrapper of opencv... What should I do now? I am surprised that there is nobody who could help positively...

kyamagu / mexopencv