zju3dv / NeuralRecon

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral
https://zju3dv.github.io/neuralrecon/
Apache License 2.0
2.02k stars 296 forks source link

adding texturing to example demo by using mvs-texturing #96

Open ht719 opened 2 years ago

ht719 commented 2 years ago

how should I add texturing to example demo by mvs-texturing? I can't find needed files of mvs-texturing except .ply generated by NeuralRecon in the example demo folder

marcomiglionico94 commented 2 years ago

Hey man I am trying to texture my mesh too . You need to create a .cam file for each frame you have. I am able to run mvs texturing but the texture are not applied correctly for me, I am using ARKit data what about you?

ShiqiMa commented 2 years ago

Yeah, I have the same confusion. Don't know what to do next after getting the .ply model. Could someone who knows the process to add texture help us?

marcomiglionico94 commented 2 years ago

So I know how to create the cam files, basically you need to create a file for each frame, on the first line you need to put this tx ty tz R00 R01 R02 R10 R11 R12 R20 R21 R22, which are the extrinsics - translation vector and rotation matrix (the transform from world to camera). On the second line line instead f d0 d1 paspect ppx ppy which are the intrinsics - focal length, distortion coefficients, pixel aspect ratio and principal point. Are you using data from ARKIT?

PolarisYxh commented 2 years ago

you can use code below to create .cam file in images direction in directions created from ARKIT. https://blog.csdn.net/yxh505613923/article/details/124848887

Cli98 commented 2 years ago

you can use code below to create .cam file in images direction in directions created from ARKIT. https://blog.csdn.net/yxh505613923/article/details/124848887

Hi @PolarisYxh ,

I would if the same process applied to scannet dataset? Do we need to modify extrinsics or intrinsics parameters accordingly? Can you briefly describe how to do this?

Thank you,

gyes00205 commented 2 years ago

you can use code below to create .cam file in images direction in directions created from ARKIT. https://blog.csdn.net/yxh505613923/article/details/124848887

Hi @PolarisYxh, I follow the blog to generate the .cam file for demo dataset. But I get a strange result as figure shown below. Do you encounter this problem? Thamk you.

gyes00205 commented 2 years ago

Hi all,

I found where the problem is. We need to convert original transformation matrix (camera to world) to another transformation matrix (world to camera). Also, we need to add 1.5 meter to z-axis translation of original transformation matrix before conversion.

berkcetinsaya commented 2 years ago

Hi all,

I found where the problem is. We need to convert original transformation matrix (camera to world) to another transformation matrix (world to camera). Also, we need to add 1.5 meter to z-axis translation of original transformation matrix before conversion.

* **Result**
  ![correct_textured](https://user-images.githubusercontent.com/34483104/178479037-c61ea14f-7cb8-4b48-b43e-0b06340bd867.png)

Hi @gyes00205 thank you for sharing your findings. I had the same issue before. I have read your comments but could not quietly understand where the changes would be made. Do you mind could you please be more specific about the changes you made and in which files?

gyes00205 commented 2 years ago

Hi @berkcetinsaya, I add a new file tools/generate_cam.py to generate .cam files. The following is my implementation.

import cv2
import os
import numpy as np
import argparse
from tqdm import tqdm
from tools.kp_reproject import *

def args_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument('--datapath', help='datapath')
    args = parser.parse_args()
    return args

def set_intrinsics(K, width, height):
    fx = K[0, 0]
    fy = K[1, 1]
    paspect = fy / fx
    dim_aspect = width / height
    img_aspect = dim_aspect * paspect
    if img_aspect < 1.0:
        flen = fy / height
    else:
        flen = fx / width
    ppx = K[0, 2] / width
    ppy = K[1, 2] / height
    return [flen, 0, 0, paspect, ppx, ppy]

if __name__ == '__main__':
    args = args_parse()
    intrinsic_path = os.path.join(args.datapath, 'intrinsics')
    pose_path = os.path.join(args.datapath, 'poses')

    intrinsic_list = sorted(os.listdir(intrinsic_path))
    for i in tqdm(intrinsic_list, desc='Processing camera file...'):
        id = i[:-4]
        K = np.loadtxt(
            os.path.join(intrinsic_path, i),
            delimiter=' '
        )
        P = np.loadtxt(
            os.path.join(pose_path, i),
            delimiter=' '
        )

        # Moving down the X-Y plane in the ARKit coordinate to meet the training settings in ScanNet.
        # The coordinate system of ScanNet is Z Up.
        P[2, 3] += 1.5
        # P is the transformation matrix (camera to world ), and we need the transformation matrix (world to camera).
        # So we need to inverse P matrix, and then P_inv is the transformation matrix (world to camera).
        P_inv = np.linalg.inv(P)
        img = cv2.imread(os.path.join(args.datapath, 'images', f'{id}.jpg'))
        cam_path = os.path.join(args.datapath, 'images', f'{id}.cam')
        intrinsics = set_intrinsics(K, img.shape[1], img.shape[0])
        with open(cam_path, 'w') as f:
            s1, s2 = '', ''
            for i in range(3):
                for j in range(3):
                    s1 += str(P_inv[i][j]) + ' '
                s2 += str(P_inv[i][3]) + ' '
            f.write(s2 + s1[:-1] + '\n')
            f.write(str(intrinsics[0]) + ' 0 0 ' + str(intrinsics[3]) + ' ' + str(intrinsics[4]) + ' ' + str(intrinsics[5]) + '\n')
JyotiLuxolis commented 2 years ago

Hi @gyes00205, do you know why there are white regions in the middle of the textured model near the tables? Do you know what might be some potential solution?

Also, thank you so much for figuring out what was wrong with the code. Can you also briefly describe how you debugged that the problem was with the +1.5m z offset and matrix inversion?

gyes00205 commented 2 years ago

Hi @JyotiLuxolis, I guess the pose matrix maybe is not perfect, so the texture of white table may be add to its nearby. I don't have a solution yet.

Cli98 commented 2 years ago

Hi @JyotiLuxolis, I guess the pose matrix maybe is not perfect, so the texture of white table may be add to its nearby. I don't have a solution yet.

I'm not sure why it is required to add 1.5. From my experiment result, values within 1 to 2 should all work.

berkcetinsaya commented 2 years ago

Hey @JyotiLuxolis,

I created my own model by recording the environment by using the ios_logger app on my iPhone. It shows that there is no sign of a white area in the middle. I agree with @gyes00205 about the pose matrix issue.

In addition, @gyes00205 thank you for sharing the code. I use ubuntu 18.04 and the required version of python for the NeuralRecon. However,

from tqdm import tqdm 
from tools.kp_reproject import *

gave errors. So, if someone who gets the same module not found errors, try these to fix: 1) Install tqdm sudo apt install python3-tqdm or whatever works for you. 2) Add the following code to the beginning of your generate_cam.py file.

import sys
sys.path.insert(0, "YOUR_NEURALRECON_PATH")

Eventually, everything works great.

gyes00205 commented 2 years ago

Hi @gyes00205, do you know why there are white regions in the middle of the textured model near the tables? Do you know what might be some potential solution?

Also, thank you so much for figuring out what was wrong with the code. Can you also briefly describe how you debugged that the problem was with the +1.5m z offset and matrix inversion?

Hi @JyotiLuxolis, The reason of +1.5m z offset is because the distance between camera and ground is 1.5m. You can see the more details in issue #76. As for the matrix inversion, we can see the the comment of mvs-texturing, they need the world to camera matrix. However, the original pose matrix is camera to world, so we need to inverse the pose matrix. You can see the following formulation:

$P_w$: world coordinate, $P_c$: camera coordinate, $P$: pose matrix (camera to world) Transform camera coordinate to world coordinate. $P_w = P P_c$ If we want to transform world coordinate to camera coordinate, just inverse $P$ matrix. $P^{-1} P_w = P_c$ So the $P^{-1}$ matrix is the world to camera matrix.

JyotiLuxolis commented 2 years ago

@gyes00205 Thanks a lot for taking the writing this detailed explanation. it helps a lot. Also, while generating the textures with mvs_texturing, I see that there are a lot colour breaks or "lines" as shown in the picture below:

image

I think its because images with different exposures are being attached to the mesh. Do you think thats the case? Can you think of some potential solutions or have any resources I can refer to?

Thanks in advance

huliang2016 commented 2 years ago

Hi @berkcetinsaya, I add a new file tools/generate_cam.py to generate .cam files. The following is my implementation.

import cv2
import os
import numpy as np
import argparse
from tqdm import tqdm
from tools.kp_reproject import *

def args_parse():
    parser = argparse.ArgumentParser()
    parser.add_argument('--datapath', help='datapath')
    args = parser.parse_args()
    return args

def set_intrinsics(K, width, height):
    fx = K[0, 0]
    fy = K[1, 1]
    paspect = fy / fx
    dim_aspect = width / height
    img_aspect = dim_aspect * paspect
    if img_aspect < 1.0:
        flen = fy / height
    else:
        flen = fx / width
    ppx = K[0, 2] / width
    ppy = K[1, 2] / height
    return [flen, 0, 0, paspect, ppx, ppy]

if __name__ == '__main__':
    args = args_parse()
    intrinsic_path = os.path.join(args.datapath, 'intrinsics')
    pose_path = os.path.join(args.datapath, 'poses')

    intrinsic_list = sorted(os.listdir(intrinsic_path))
    for i in tqdm(intrinsic_list, desc='Processing camera file...'):
        id = i[:-4]
        K = np.loadtxt(
            os.path.join(intrinsic_path, i),
            delimiter=' '
        )
        P = np.loadtxt(
            os.path.join(pose_path, i),
            delimiter=' '
        )

        # Moving down the X-Y plane in the ARKit coordinate to meet the training settings in ScanNet.
        # The coordinate system of ScanNet is Z Up.
        P[2, 3] += 1.5
        # P is the transformation matrix (camera to world ), and we need the transformation matrix (world to camera).
        # So we need to inverse P matrix, and then P_inv is the transformation matrix (world to camera).
        P_inv = np.linalg.inv(P)
        img = cv2.imread(os.path.join(args.datapath, 'images', f'{id}.jpg'))
        cam_path = os.path.join(args.datapath, 'images', f'{id}.cam')
        intrinsics = set_intrinsics(K, img.shape[1], img.shape[0])
        with open(cam_path, 'w') as f:
            s1, s2 = '', ''
            for i in range(3):
                for j in range(3):
                    s1 += str(P_inv[i][j]) + ' '
                s2 += str(P_inv[i][3]) + ' '
            f.write(s2 + s1[:-1] + '\n')
            f.write(str(intrinsics[0]) + ' 0 0 ' + str(intrinsics[3]) + ' ' + str(intrinsics[4]) + ' ' + str(intrinsics[5]) + '\n')
  • Execution
python tools/generate_cam.py --datapath neucon_demodata_b5f1/neucon_demodata_b5f1/

And then, we can see the corresponding .cam file in neucon_demodata_b5f1/neucon_demodata_b5f1/images folder.

@gyes00205 hi, thanks for your code.

I was curious on why do we need to do the calculation for the camera intrinsic matrix?

And If I have the scaled image, what should the width/height be like? e.g. the scannet dataset image size is: 1296X968, in order to keep aspect ration, we need to pad 2px on top and bottom, after that we scale the image from 1296X972 to 640X480, so in this scenario, how can I calculate the camera intrinsic matrix?

Thanks in advance.

macTracyHuang commented 1 year ago

I was curious on why do we need to do the calculation for the camera intrinsic matrix?

I think that's because mvs-texturing takes an normalized intrinsic which was mentioned here