phuang17 / DeepMVS

DeepMVS: Learning Multi-View Stereopsis
https://phuang17.github.io/DeepMVS/index.html
BSD 2-Clause "Simplified" License
331 stars 85 forks source link

Depth&Pose Consistency in MVS-Synth Dataset #13

Closed WANG-KX closed 4 years ago

WANG-KX commented 5 years ago

Dear,

Thanks for your work and open-source MVS-Synth dataset. However, I found that the consistency in the dataset is not good, i.e. pixels cannot be projected to the right position using the provided "ground truth depth and poses". I write a simple python code to demonstrate the pixels mismatch.

A simple example: 1

From left to right is: image1, image2, rendered image 2, overlapped image2 and rendered image2. Clearly (best view in full resolution), the person on the street is not projected into the right position. Also, the lane marker and wall are not consistent as highlighted in the red circle. Meaning that the depth and poses are not consistence.

You can run more samples yourself. check_consist.py.tar.gz Just changing the line 7 according to your environment. Line 8 and line 9 selects the left and right image.

Is there any misunderstanding in my code? Hope I can hear from you.

Regards, Kaixuan

phuang17 commented 5 years ago

Hi Kaixuan, I looked through your codes, but did not find anything wrong in the transformation. I will do more experiments to make sure, but could you try other datasets (GTAV_720 and GTAV_1080) and see if they have the same problem? Thanks.

WANG-KX commented 5 years ago

Dear, Thanks for your reply. I would like to check the other two datasets and inform the results under this issue.

WANG-KX commented 5 years ago

Dear, I only tested on GTAV_720 dataset, cause the GTAV_1080 is too big and I did not download. I found that the depth-pose is consistent in GTAV_720 dataset. Maybe there are bugs when you scale the dataset to low-resolutions.

phuang17 commented 5 years ago

Hi Kaixuan, thanks a lot for verifying that GTAV_720 is ok. I will work on updating GTAV_540, but I don't have an ETA for that yet. For now, could you resize the images from GTAV_720 if you would like to work with lower resolutions?

WANG-KX commented 5 years ago

I am glad that I can help:)

mihaidusmanu commented 5 years ago

It seems that the problem for GTAV_540 is f_x - the images don't respect the initial aspect ratio. The resolution of images is 810x540 instead of 960x540. In order to get coherent correspondences you need to scale f_x for GTAV_540 accordingly - f_x * 2 * 810 / 1920.

phuang17 commented 5 years ago

@mihaidusmanu You are totally correct. Not sure why I did such simple math wrong....

phongnhhn92 commented 4 years ago

@WANG-KX Thanks for sharing the code to warp the source image to target image using the depth map of the target view. As I understand you are finding the new pixel position for each of the pixel in the target image by rounding them up as a int number. I think this is sub-optimal, my suggestion is to use the bilinear sampling approach. Another thing is that you are using 2 for loops which is time consuming so is there any way we can achieve the same results with pure matrix multiplication ?

WANG-KX commented 4 years ago

hi, this is just a proof code that doesn't seek for any optimization. And yes, there exist optimizatin using torch.nn.functional.gridsample.

phongnhhn92 commented 4 years ago

hi, this is just a proof code that doesn't seek for any optimization. And yes, there exist optimizatin using torch.nn.functional.gridsample.

Hi, Thanks for your reply ! I have read the gridsample function and it requires the range in the grid between -1 and 1. In your code, you are looping in range(W) and range(H). How can I convert them in the range between (-1,1) ?

phongnhhn92 commented 4 years ago

@WANG-KX Here is the code that I have made to optimize your code using the torch.nn.functional.gridsample but it seems like I am making some mistakes :(

import numpy as np
from numpy.linalg import inv
from PIL import Image
import imageio
import json,torch
from torchvision.transforms import *
from torchvision.utils import *
import torch.nn.functional as F

path = "./data/MVSSynth/0000/"
image1_idx = 0
image2_idx = 2

def read_img_depth_pose(i):
    img_path = path + "%04d.png" % i
    depth_path = path + "%04d.exr" % i
    pose_path = path + "%04d.json" % i

    img = np.array(Image.open(img_path))
    raw_depth = np.asarray(imageio.imread(depth_path)[:])
    raw_depth = np.clip(raw_depth, 0.1, 1000.0)
    with open(pose_path) as f:
        r_info = json.load(f)
        c_x = r_info["c_x"]
        c_y = r_info["c_y"]
        f_x = r_info["f_x"]
        f_y = r_info["f_y"]
        # f_x = f_x * 2 * 810 / 1920
        extrinsic = np.array(r_info["extrinsic"])
        extrinsic = inv(extrinsic)

    K = np.array([[f_x, 0, c_x], [0, f_y, c_y], [0, 0, 1]])
    return img, raw_depth, K, extrinsic

img1, depth1, K, T1 = read_img_depth_pose(image1_idx)
img2, depth2, K, T2 = read_img_depth_pose(image2_idx)

left2right = np.dot(inv(T2), T1)
left2right_r = left2right[:3, :3]
left2right_t = left2right[:3, 3]
left2right_t = left2right_t.reshape(3,1)

H, W, _ = img1.shape
recon = np.zeros(img1.shape)
recon_d = np.ones(depth1.shape) * 1000.0

xs, ys = np.meshgrid(np.linspace(-1, 1, W), np.linspace(-1, 1, H))
xs = xs.reshape(1, H, W)
ys = ys.reshape(1, H, W)
depth_gt = depth1.reshape(1,H,W)
uv_d1 = np.vstack((xs*depth_gt,ys*depth_gt,depth_gt))
uv_d1 = uv_d1.reshape(3,-1)
uv_c1 = np.dot(inv(K),uv_d1)
uv_c2 = np.dot(left2right_r,uv_c1)+left2right_t
uv_d2 = np.dot(K,uv_c2)
uv2 = uv_d2[0:2]/uv_d2[2]
uv2 = uv2.reshape(2,H,W).astype(np.float32)

#Create img tensor
img1_tensor = ToTensor()(img1).unsqueeze(0)
img2_tensor = ToTensor()(img2).unsqueeze(0)
sampler = torch.from_numpy(uv2).permute(1,2,0).unsqueeze(0)

# Sample pixels from the reference image to the query image
img_warp = F.grid_sample(img2_tensor,sampler,align_corners=True)

#Save Image
save_image(img1_tensor[0], './1.png')
save_image(img2_tensor[0], './2.png')
save_image(img_warp[0], './warped.png')

print('Done')
WANG-KX commented 4 years ago

the x-coordinate should be normalized as "x = x / (W-1) 2.0 - 1.0" such that it's between -1 and 1. the y-coordinate should be normalized as "y = y / (H-1) 2.0 - 1.0" such that it's between -1 and 1.

phongnhhn92 commented 4 years ago

@WANG-KX I think I have already done that with the np.meshgrid function.

xs, ys = np.meshgrid(np.linspace(-1, 1, W), np.linspace(-1, 1, H))
xs = xs.reshape(1, H, W)
ys = ys.reshape(1, H, W)
depth_gt = depth1.reshape(1,H,W)
uv_d1 = np.vstack((xs*depth_gt,ys*depth_gt,depth_gt))
uv_d1 = uv_d1.reshape(3,-1)
uv_c1 = np.dot(inv(K),uv_d1)

I am just trying to optimize the nested for loop with pure matrix multiplication but I am not sure if it is correct because the returned uv2 is not in range of (-1,1).

WANG-KX commented 4 years ago

ok, then you need to scale the fx, fy and shift cx, cy since the coordinate is normalized. fx = fx / W 2.0 fy = fy / H 2.0 cx = cx / W 2.0 - 1.0 cy = cy / H 2.0 - 1.0 Not sure above is correct. you have to do it yourself.

phongnhhn92 commented 4 years ago

@WANG-KX Thanks a lot ! It works :dancers: