mapillary / OpenSfM

Open source Structure-from-Motion pipeline
https://www.opensfm.org/
BSD 2-Clause "Simplified" License
3.34k stars 852 forks source link

Calculating the transform between two images using reconstruction pipeline #100

Closed EderSantana closed 7 years ago

EderSantana commented 7 years ago

Ok, this is pretty n00b, but I'm not understanding how to calculate the perspective transform between two images using the reconstruction parameters only. I'd really appreciate your help.

First of all, I'm sorry if there is already a built in way to do this. If not, lets assume that we ran the pipeline on the lund dataset. We can calculate the camera matrix as (https://github.com/mapillary/OpenSfM/issues/84):

# load opensfm dataset
data = dataset.DataSet('/Users/eder/python/OpenSFM/data/lund')
C = data.load_camera_models().values()[0]

# calculate camera matrix
w = max(C.width, C.height)
K = [[C.focal * w, 0, C.width/2],
     [0, C.focal * w, C.height/2],
     [0, 0, 1]]
K = np.asarray(K)
invK = np.linalg.inv(K)

And for this example, lets play with the following two images:

# load image
rec = data.load_reconstruction()[0]
shot1 = rec.shots['026.jpg']
img1 = data.image_as_array(shot1.id)
shot2 = rec.shots['002.jpg']
img2 = data.image_as_array(shot2.id)
cv2.imshow('shot', img1[:, :, ::-1])
cv2.imshow('shot2', img2[:, :, ::-1])

I'm trying to estimate the transform from shot1 to shot2. So let us invert the transform 1 and apply transform 2:

# calculate transform (it can be simplified) and I'm only sure
# what I'm doing with rotation matrices, I'm not sure about the translation part
R1 = shot1.pose.get_rotation_matrix()
invR1 = np.linalg.inv(R1)
t1 = shot1.pose.translation
T1 = -invR1.dot(t1)
Hr1 = K.dot(invR1.dot(invK))
Ht1 = np.eye(3)
Ht1[:, 2] = -K.dot(T1)/T1[2]
invert_1 = Ht1.dot(Hr1)

R2 = shot2.pose.get_rotation_matrix()
invR2 = np.linalg.inv(R2)
t2 = shot2.pose.translation
T2 = invR2.dot(t2)
Hr2 = K.dot(R2.dot(invK))
Ht2 = np.eye(3)
Ht2[:, 2] = K.dot(T2)/T2[2]
transform_2 = Ht2.dot(Hr2)

H = transform_2.dot(invert_1)

Finally we warp the first image:

# warp image
wimg = cv2.warpPerspective(img1[:, :, ::-1], H, img1.shape[:2][::-1])
cv2.imshow('warp', wimg)
cv2.waitKey(0)

But unfortunately the result doesn't look half as close to what I was expect. The equations above were inspired by the following discussion: http://stackoverflow.com/questions/23275877/opencv-get-perspective-matrix-from-translation-rotation

Thanks for anybody who would have time to check this out.

paulinus commented 7 years ago

If i understand correctly, you are trying to compute an homography that would map one image onto another. Unfortunately, in general, there is no such homography. The motion between the two images is different for points at different distances to the camera and can not be expressed as a single homography.

There are two special cases where the motion is an homography. These are when the scene is planar or when there is no translation between the shots. Those are not the case for the Lund dataset. In fact, as of now, those are failure cases where OpenSfM is not able to compute a reconstruction.

On the stack overflow thread that you mention, they are dealing with the case of a planar scene. The scene is a planar chessboard a z = 0. The equations there are for warping the images assuming that the scene is at the plane z = 0.

In the Lund data set z = 0 corresponds approximately to the ground. If you warp the images using those equations, the ground should be the only think that is warped approximately ok. The rest should appear stretched as if it has been projected to the ground.

To properly warp images from one to another, you need to first compute a depth map. That will enable you to warp each pixel by the right amount instead of applying the same transform to all pixels. Not sure if that is what you need, but note that we are currently working on dense depth estimation on the dense branch.

EderSantana commented 7 years ago

ah, that is interesting! Maybe I was understanding this all wrong. I thought that the planar only transform was covered by the warpAffine. I thought that warpPerspective could be calculated from the camera matrix, rotation and translation matrices.

So let me pose it this way. Is there a way, even if I have distortions and dark pixel regions, to virtually move the camera from pose 1 to pose 2 and apply the result in shot 1? You saying that this is not possible without depth?

paulinus commented 7 years ago

Exactly, in the general case, to warp an image onto another you need the depth of the pixels or equivalently, the geometry of the scene.

If you need the warp to be precise, then you need precise depthmap or geometry. If you just want an approximate warp to validate the SfM results, then you could use the mesh as we do on the viewer: You project image 1 onto its 3D mesh, then render the textured mesh from camera 2's point of view.

Does this make sense? What is your use case?

EderSantana commented 7 years ago

yes, it does perfect sense now! where do you this:

If you just want an approximate warp to validate the SfM results, then you could use the mesh as we do on the viewer: You project image 1 onto its 3D mesh, then render the textured mesh from camera 2's point of view.

I'm a deep learning PhD student, but I want to learn more about geometry to make deep learning great again! 😄 I'm planning to use OpenSFM in a project (I'll make sure to cite you guys), your help is greatly appreciated! RE:

What is your use case?

paulinus commented 7 years ago

The mesh projection and reprojection is done using Three.js in the viewer.

We did a small experiment in python that might be what you are looking for. It shows a source image wrapped into a target image using a mesh. The wrapping is quite coarse, but will give you an idea of what you can get with a better mesh or a depth map.

#!/usr/bin/env python

import argparse
import logging

import numpy as np
import pylab
import scipy.interpolate

import opensfm.features
from mapillary_sfm import dataset

logger = logging.getLogger(__name__)

def project(shot, point):
    p = shot.project(point.coordinates)
    return opensfm.features.denormalized_image_coordinates(
        p.reshape(1, 2), shot.camera.width, shot.camera.height)[0]

def warp_image(reconstruction, src_id, dst_id):
    src_shot = reconstruction.shots[src_id]
    dst_shot = reconstruction.shots[dst_id]
    points = reconstruction.points.values()
    src_points = np.array([project(src_shot, p) for p in points])
    dst_points = np.array([project(dst_shot, p) for p in points])
    src = data.image_as_array(src_id)
    dst = data.image_as_array(dst_id)
    dst_y, dst_x = np.indices(dst.shape[:2])
    dst_grid = np.column_stack([dst_x.ravel(), dst_y.ravel()])

    x = scipy.interpolate.griddata(dst_points, src_points[:, 0],
                                   dst_grid, method='linear')
    y = scipy.interpolate.griddata(dst_points, src_points[:, 1],
                                   dst_grid, method='linear')
    x = np.where(np.isnan(x), 0, x).reshape(dst.shape[:2])
    y = np.where(np.isnan(y), 0, y).reshape(dst.shape[:2])
    x = np.clip(x, 0, src.shape[1] - 1)
    y = np.clip(y, 0, src.shape[0] - 1)

    warped = src[y.astype(int), x.astype(int)]

    pylab.figure()
    pylab.subplot(2, 2, 1)
    pylab.title('source')
    pylab.imshow(src)
    pylab.subplot(2, 2, 2)
    pylab.title('target')
    pylab.imshow(dst)
    pylab.subplot(2, 2, 3)
    pylab.title('source wraped to target')
    pylab.imshow(warped)
    pylab.subplot(2, 2, 4)
    pylab.title('comparision')
    pylab.imshow(warped / 2 + dst / 2)

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)

    parser = argparse.ArgumentParser(
        description="Propagate labels using SfM data")
    parser.add_argument(
        'dataset',
        help='path to the dataset to be processed')
    parser.add_argument(
        '--image',
        help='main image')
    args = parser.parse_args()

    data = dataset.DataSet(args.dataset)
    image = args.image or data.images()[0]
    ar = data.load_atomic_reconstruction(image)

    for shot in ar.shots.values():
        warp_image(ar, image, shot.id)
    pylab.show()
EderSantana commented 7 years ago

This is awesome! Thanks, Pau!

I'll see if I can adapt the script to work without mapillary_sfm. In any case, this whole thread was really useful!

EderSantana commented 7 years ago

Here is the updated code for future reference: https://gist.github.com/EderSantana/5a5c3a6ad327f303cb915198aec8a90d

Results: screen shot 2016-09-27 at 4 02 32 pm

screen shot 2016-09-27 at 4 04 29 pm

edgarriba commented 7 years ago

@EderSantana nice snipet!