Closed rainfall1998 closed 2 years ago
Hi! Thanks for your interest!
Since the procedure of data preparation is a little complicated, we still need some time to release the final document. We provide the draft here, hope it is helpful for you. Welcome for your feedback, which can be helpful for us to improve the document!
Before preparing your own data, you should checkout the dataset module carefully.
Overall, you need to place your data with the following structure and create a corresponding config file.
manhattan_sdf
├───data
| ├───$scene_name
| | ├───intrinsic.txt
| | ├───images
| | | ├───0.png
| | | ├───1.png
| | | └───...
| | ├───pose
| | | ├───0.txt
| | | ├───1.txt
| | | └───...
| | ├───depth_colmap
| | | ├───0.npy
| | | ├───1.npy
| | | └───...
| | └───semantic_deeplab
| | ├───0.png
| | ├───1.png
| | └───...
| └───...
├───configs
| ├───$scene_name.yaml
| └───...
└───...
You should place RGB images in data/$scene_name/images
folder, note that the filenames can be arbitrary but you need to make sure that they are consistent with files in other folders under data/$scene_name
.
Save the 4x4
intrinsic matrix in data/$scene_name/intrinsic.txt
.
You can solve camera poses with COLMAP or other tools you like. Then you need to normalize the camera poses and modify some configs to ensure that:
Save the normalized poses as 4x4
matrices in txt
format under data/$scene_name/pose
folder. Remember to save the scale and offset you used to normalize here so that you can transform to original coordinate if you want to extract mesh and compare with ground truth mesh.
You need to run sparse and dense reconstruction of COLMAP first. Please refer to this instruction if you want to use known camera poses.
After dense reconstruction, you can obtain depth prediction for each view. However, the depth predictions can be noisy, so we recommend you to run fusion first to filter out most noises. Since original COLMAP does not have fusion mask for each view, you need to compile and run this customized version, which is used in NerfingMVS.
You need to run 2D semantic segmentations to generate semantic predictions from images. We will upload our trained model and inference code soon.
Thanks for your reply. It is the first time for me to use the scannet dataset. However, I find little information about calibration on the github of scannet. Could you share some experience about the image and depth calibration of the scannet dataset? Do I need to undistort the rgb and depth, or just extract them from .sens ? And the scaling interpolation method you use is nearest neighbor or Bilinear? Thank you very much!
Thanks for your reply. It is the first time for me to use the scannet dataset. However, I find little information about calibration on the github of scannet. Could you share some experience about the image and depth calibration of the scannet dataset? Do I need to undistort the rgb and depth, or just extract them from .sens ? And the scaling interpolation method you use is nearest neighbor or Bilinear? Thank you very much!
We just extract them from .sens
file using the official code.
We use bilinear interpolation to rescale RGB images.
Thank you very much!
Hi, thanks for your wonderful work! I woud like to train on the new recorded sequences. so I wonder when the 'Data preparation' will be published? Looking forward to your early reply.