zju3dv / manhattan_sdf

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral
https://zju3dv.github.io/manhattan_sdf/
Other
491 stars 35 forks source link

Problem on data preparation #5

Closed rainfall1998 closed 2 years ago

rainfall1998 commented 2 years ago

Hi, thanks for your wonderful work! I woud like to train on the new recorded sequences. so I wonder when the 'Data preparation' will be published? Looking forward to your early reply.

ghy0324 commented 2 years ago

Hi! Thanks for your interest!

Since the procedure of data preparation is a little complicated, we still need some time to release the final document. We provide the draft here, hope it is helpful for you. Welcome for your feedback, which can be helpful for us to improve the document!

Before preparing your own data, you should checkout the dataset module carefully.

Overall, you need to place your data with the following structure and create a corresponding config file.

manhattan_sdf
├───data
|   ├───$scene_name
|   |   ├───intrinsic.txt
|   |   ├───images
|   |   |   ├───0.png
|   |   |   ├───1.png
|   |   |   └───...
|   |   ├───pose
|   |   |   ├───0.txt
|   |   |   ├───1.txt
|   |   |   └───...
|   |   ├───depth_colmap
|   |   |   ├───0.npy
|   |   |   ├───1.npy
|   |   |   └───...
|   |   └───semantic_deeplab
|   |       ├───0.png
|   |       ├───1.png
|   |       └───...
|   └───...
├───configs
|   ├───$scene_name.yaml
|   └───...
└───...

Images

You should place RGB images in data/$scene_name/images folder, note that the filenames can be arbitrary but you need to make sure that they are consistent with files in other folders under data/$scene_name.

Intrinsic parameters

Save the 4x4 intrinsic matrix in data/$scene_name/intrinsic.txt.

Camera poses

You can solve camera poses with COLMAP or other tools you like. Then you need to normalize the camera poses and modify some configs to ensure that:

Save the normalized poses as 4x4 matrices in txt format under data/$scene_name/pose folder. Remember to save the scale and offset you used to normalize here so that you can transform to original coordinate if you want to extract mesh and compare with ground truth mesh.

COLMAP depth maps

You need to run sparse and dense reconstruction of COLMAP first. Please refer to this instruction if you want to use known camera poses.

After dense reconstruction, you can obtain depth prediction for each view. However, the depth predictions can be noisy, so we recommend you to run fusion first to filter out most noises. Since original COLMAP does not have fusion mask for each view, you need to compile and run this customized version, which is used in NerfingMVS.

Semantic predictions

You need to run 2D semantic segmentations to generate semantic predictions from images. We will upload our trained model and inference code soon.

rainfall1998 commented 2 years ago

Thanks for your reply. It is the first time for me to use the scannet dataset. However, I find little information about calibration on the github of scannet. Could you share some experience about the image and depth calibration of the scannet dataset? Do I need to undistort the rgb and depth, or just extract them from .sens ? And the scaling interpolation method you use is nearest neighbor or Bilinear? Thank you very much!

ghy0324 commented 2 years ago

Thanks for your reply. It is the first time for me to use the scannet dataset. However, I find little information about calibration on the github of scannet. Could you share some experience about the image and depth calibration of the scannet dataset? Do I need to undistort the rgb and depth, or just extract them from .sens ? And the scaling interpolation method you use is nearest neighbor or Bilinear? Thank you very much!

We just extract them from .sens file using the official code. We use bilinear interpolation to rescale RGB images.

rainfall1998 commented 2 years ago

Thank you very much!