Closed WANG-KX closed 4 years ago
Hi Kaixuan, I looked through your codes, but did not find anything wrong in the transformation. I will do more experiments to make sure, but could you try other datasets (GTAV_720 and GTAV_1080) and see if they have the same problem? Thanks.
Dear, Thanks for your reply. I would like to check the other two datasets and inform the results under this issue.
Dear, I only tested on GTAV_720 dataset, cause the GTAV_1080 is too big and I did not download. I found that the depth-pose is consistent in GTAV_720 dataset. Maybe there are bugs when you scale the dataset to low-resolutions.
Hi Kaixuan, thanks a lot for verifying that GTAV_720 is ok. I will work on updating GTAV_540, but I don't have an ETA for that yet. For now, could you resize the images from GTAV_720 if you would like to work with lower resolutions?
I am glad that I can help:)
It seems that the problem for GTAV_540 is f_x
- the images don't respect the initial aspect ratio. The resolution of images is 810x540
instead of 960x540
. In order to get coherent correspondences you need to scale f_x
for GTAV_540 accordingly - f_x * 2 * 810 / 1920
.
@mihaidusmanu You are totally correct. Not sure why I did such simple math wrong....
@WANG-KX Thanks for sharing the code to warp the source image to target image using the depth map of the target view. As I understand you are finding the new pixel position for each of the pixel in the target image by rounding them up as a int number. I think this is sub-optimal, my suggestion is to use the bilinear sampling approach. Another thing is that you are using 2 for loops which is time consuming so is there any way we can achieve the same results with pure matrix multiplication ?
hi, this is just a proof code that doesn't seek for any optimization. And yes, there exist optimizatin using torch.nn.functional.gridsample.
hi, this is just a proof code that doesn't seek for any optimization. And yes, there exist optimizatin using torch.nn.functional.gridsample.
Hi, Thanks for your reply ! I have read the gridsample function and it requires the range in the grid between -1 and 1. In your code, you are looping in range(W) and range(H). How can I convert them in the range between (-1,1) ?
@WANG-KX Here is the code that I have made to optimize your code using the torch.nn.functional.gridsample but it seems like I am making some mistakes :(
import numpy as np
from numpy.linalg import inv
from PIL import Image
import imageio
import json,torch
from torchvision.transforms import *
from torchvision.utils import *
import torch.nn.functional as F
path = "./data/MVSSynth/0000/"
image1_idx = 0
image2_idx = 2
def read_img_depth_pose(i):
img_path = path + "%04d.png" % i
depth_path = path + "%04d.exr" % i
pose_path = path + "%04d.json" % i
img = np.array(Image.open(img_path))
raw_depth = np.asarray(imageio.imread(depth_path)[:])
raw_depth = np.clip(raw_depth, 0.1, 1000.0)
with open(pose_path) as f:
r_info = json.load(f)
c_x = r_info["c_x"]
c_y = r_info["c_y"]
f_x = r_info["f_x"]
f_y = r_info["f_y"]
# f_x = f_x * 2 * 810 / 1920
extrinsic = np.array(r_info["extrinsic"])
extrinsic = inv(extrinsic)
K = np.array([[f_x, 0, c_x], [0, f_y, c_y], [0, 0, 1]])
return img, raw_depth, K, extrinsic
img1, depth1, K, T1 = read_img_depth_pose(image1_idx)
img2, depth2, K, T2 = read_img_depth_pose(image2_idx)
left2right = np.dot(inv(T2), T1)
left2right_r = left2right[:3, :3]
left2right_t = left2right[:3, 3]
left2right_t = left2right_t.reshape(3,1)
H, W, _ = img1.shape
recon = np.zeros(img1.shape)
recon_d = np.ones(depth1.shape) * 1000.0
xs, ys = np.meshgrid(np.linspace(-1, 1, W), np.linspace(-1, 1, H))
xs = xs.reshape(1, H, W)
ys = ys.reshape(1, H, W)
depth_gt = depth1.reshape(1,H,W)
uv_d1 = np.vstack((xs*depth_gt,ys*depth_gt,depth_gt))
uv_d1 = uv_d1.reshape(3,-1)
uv_c1 = np.dot(inv(K),uv_d1)
uv_c2 = np.dot(left2right_r,uv_c1)+left2right_t
uv_d2 = np.dot(K,uv_c2)
uv2 = uv_d2[0:2]/uv_d2[2]
uv2 = uv2.reshape(2,H,W).astype(np.float32)
#Create img tensor
img1_tensor = ToTensor()(img1).unsqueeze(0)
img2_tensor = ToTensor()(img2).unsqueeze(0)
sampler = torch.from_numpy(uv2).permute(1,2,0).unsqueeze(0)
# Sample pixels from the reference image to the query image
img_warp = F.grid_sample(img2_tensor,sampler,align_corners=True)
#Save Image
save_image(img1_tensor[0], './1.png')
save_image(img2_tensor[0], './2.png')
save_image(img_warp[0], './warped.png')
print('Done')
the x-coordinate should be normalized as "x = x / (W-1) 2.0 - 1.0" such that it's between -1 and 1. the y-coordinate should be normalized as "y = y / (H-1) 2.0 - 1.0" such that it's between -1 and 1.
@WANG-KX I think I have already done that with the np.meshgrid function.
xs, ys = np.meshgrid(np.linspace(-1, 1, W), np.linspace(-1, 1, H))
xs = xs.reshape(1, H, W)
ys = ys.reshape(1, H, W)
depth_gt = depth1.reshape(1,H,W)
uv_d1 = np.vstack((xs*depth_gt,ys*depth_gt,depth_gt))
uv_d1 = uv_d1.reshape(3,-1)
uv_c1 = np.dot(inv(K),uv_d1)
I am just trying to optimize the nested for loop with pure matrix multiplication but I am not sure if it is correct because the returned uv2 is not in range of (-1,1).
ok, then you need to scale the fx, fy and shift cx, cy since the coordinate is normalized. fx = fx / W 2.0 fy = fy / H 2.0 cx = cx / W 2.0 - 1.0 cy = cy / H 2.0 - 1.0 Not sure above is correct. you have to do it yourself.
@WANG-KX Thanks a lot ! It works :dancers:
Dear,
Thanks for your work and open-source MVS-Synth dataset. However, I found that the consistency in the dataset is not good, i.e. pixels cannot be projected to the right position using the provided "ground truth depth and poses". I write a simple python code to demonstrate the pixels mismatch.
A simple example:
From left to right is: image1, image2, rendered image 2, overlapped image2 and rendered image2. Clearly (best view in full resolution), the person on the street is not projected into the right position. Also, the lane marker and wall are not consistent as highlighted in the red circle. Meaning that the depth and poses are not consistence.
You can run more samples yourself. check_consist.py.tar.gz Just changing the line 7 according to your environment. Line 8 and line 9 selects the left and right image.
Is there any misunderstanding in my code? Hope I can hear from you.
Regards, Kaixuan