oscarmcnulty / gta-3d-dataset

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels all generated using the Grand Theft Auto 5 game engine.
134 stars 15 forks source link

3d Projective geometry warping using ground truth depth and pose is not working #6

Open NagabhushanSN95 opened 3 years ago

NagabhushanSN95 commented 3 years ago

Hi,

Thanks for sharing your database. I'm trying to use the database in a view synthesis related task. Given two frames, I'm trying to reconstruct one from the other using 3d projective geometry based warping from here.

For depth, I'm using the your code to get disparity map and then compute depth map as 1/disparity and clipping max value to 0

depth1 = numpy.clip(1 / depth1, a_min=0, a_max=1000)

For camera poses, I'm using the viewMatrix you've provided in the json files.

I'm constructing the camera intrinsic using the fov you've mentioned: 50

def camera_intrinsic_transform(vfov=50, hfov=50, capture_width=1680, capture_height=1050, pixel_width=1680,
                               pixel_height=1050):
    camera_intrinsics = numpy.eye(3)
    camera_intrinsics[0, 0] = (capture_width / 2.0) / math.tan(math.radians(hfov / 2.0))
    camera_intrinsics[0, 2] = pixel_width / 2.0
    camera_intrinsics[1, 1] = (capture_height / 2.0) / math.tan(math.radians(vfov / 2.0))
    camera_intrinsics[1, 2] = pixel_height / 2.0
    return camera_intrinsics

However, the reconstructed image is not matching with the rendered image.

NagabhushanSN95 commented 3 years ago

For example, I'm considering frames 1 and 2 of 67b90283-627b-45cf-9ff2-63dcb95bfc67

The corresponding camera poses are below

transformation1 = numpy.array(
    [0.3948856089590146, 0.9186789480807126, 0.009707391790442067, 133.64471308856065,
     -0.06591960053103078, 0.017792904372381235, 0.9976662115822422, -135.3130767061915,
     0.9163622882946337, -0.3946039834633436, 0.06758511385934628, 917.6464175831973,
     0.0, 0.0, 0.0, 1.0]
).reshape(4, 4)
transformation2 = numpy.array(
    [0.3538838093846013, 0.9352637448678055, 0.0069155367658538455, 92.4380718493328,
     -0.004427438570940968, -0.005718767304706697, 0.999973886901392, -74.00623990858556,
     0.9352788591735129, -0.35390518568112983, 0.0021170437260363707, 949.783528073028,
     0.0, 0.0, 0.0, 1.0]
).reshape(4, 4)

I've attached the frame1, frame2 and frame2_warped (frame1 warped to the view of frame2). Ideally, frame2_warped should match with frame2 (except for the object motions).

frame1 frame1

frame2 frame2

frame2_warped frame2_warped

Am I missing something?

oscarmcnulty commented 3 years ago

It looks like the polarity of the transform you are doing is somehow reversed? Shouldn't frame2_warped be more zoomed in than the original frame1?

You could try using the visualization script at https://github.com/oscarmcnulty/gta-3d-dataset/blob/master/test_vis.py to debug?

NagabhushanSN95 commented 3 years ago

Thanks for the lightning-quick reply. That was my thought too. So, I tried changing the sign of the z-translation after computing the relative transformation. Though the camera zoomed into the scene, the amount of zooming was very small compared to frame2. So, I thought something else might be wrong.

Can you please confirm if my computation of depth, camera intrinsic, and camera poses is correct? I'll also try to debug using the test_vis.py code.