sxyu / svox2

Plenoxels: Radiance Fields without Neural Networks
BSD 2-Clause "Simplified" License
2.82k stars 360 forks source link

Custom dataset with poses and intrinsics included -> NSVF Format #10

Open phelps-matthew opened 2 years ago

phelps-matthew commented 2 years ago

I have a large dataset comprising renders of a single object taken over a fairly dense sampling of poses (rotations and translations). I also have the camera intrinsics and distortion coefficients (though it looks like these are usually not incorporated in most radiance field work?).

I was hoping you might be able to lend some guidance on how I can use this supplemental information to form a dataset that is compatible with svox2. Specifically, do you have any tips on how I might leverage colmap and colmap2nsvf.py? When running proc_colmap.sh on a directory of raw images, I see it produces its own pose.txt estimates, database.db, and points.npy and appears to sample only a subset of the given images. Are there any modifications I should be making that are immediately evident to you?

Any help is greatly appreciated!

sxyu commented 2 years ago

Hi, thanks for the question.

If you want to use your own camera poses, you will have to process them our NSVF-based format, which is fairly simple anyway (see below). Other than proc_colmap there is also proc_record3d.py which processes captures from the iPhone app Record3D to our format; this might be a helpful example.

Currently svox2 itself only supports the pinhole model fx/fy/cx/cy. The run_colmap.py script (called by proc_colmap.sh) actually estimates radial distortion parameters by default with COLMAP but will undistort the images. For simplicity, you can also use OpenCV to undistort your own images.

Format reference:

intrinsics.txt: 4x4 matrix,

fx 0 0 cx 
0 fy 0 cy
0 0 1 0
0 0 0 1

images/ or rgb/: images (*.png or *.jpg) pose/: 4x4 c2w pose matrix for each image (*.txt), OpenCV convention

phelps-matthew commented 2 years ago

Thank you kindly! I may try undistorting all my images, though the distortion coefficients are very small here, so I'm going to ignore them for the moment.

I was able to get the nsvf dataset loader working after formatting my images and poses to the following convention (had to add in a conversion from grayscale to rgb)

<dataset_name>
|-- bbox.txt         # bounding-box file
|-- intrinsics.txt   # 4x4 camera intrinsics
|-- images
    |-- 0_000001.png        # target image for each view
    ...
    |-- 1_000001.png
    ...
|-- pose
    |-- 0_000001.txt        # camera pose for each view (4x4 matrices)
    ...
    |-- 1_000001.txt
    ...

I'll continue training and testing, granted there are quite a number of hyperparamers to adjust here, but hoping I can start to see the rough formation of my imaged object.

Do you know what convention rotation matrices are to follow for NSVF? Having a difficult time determining if my axes are aligned with its standard. For example, here is my distribution of camera poses

image

phelps-matthew commented 2 years ago

In case someone else will find this helpful.. I believe COLMAP follows the format of the projection matrix given as transforming 3D camera coordinates to world coordinates. Hence, to go from the above image as formed from W -> C transformation to this image, image

try the following:

# Given 3x3 W -> C SO(3) matrix and r, the translation vector, form the correct 4x4 transformation matrix
# X_w = R^T X_c - R^T t (cam to world, what was needed)
# X_c = R X_w + t (world to cam, what I had before)                                                                                                                                                                                       
Rt = np.matmul(so3.transpose(), r)                                                                                                                                                                                                                                                                                        
trans = np.vstack((np.hstack((so3.transpose(), -Rt.reshape(-1, 1))), [0, 0, 0, 1]))

You can then view using python view_data.py <data_root>. All one needs is images, poses, and intrinsics that follow the above format (no bbox.txt or other files strictly needed).

phelps-matthew commented 2 years ago

images/ or rgb/: images (.png or .jpg) pose/: 4x4 c2w pose matrix for each image (*.txt), OpenCV convention

Apologies, I totally missed this remark! Would have saved myself a headache 😂

qhdqhd commented 2 years ago

how can i use views with different intrinsics (images captured by multi-cameras)?

LinGeLin commented 2 years ago

what does mean?

povolann commented 1 year ago

I am little bit confused about the intrinsic matrix, shouldn't it be like this?

fx 0 cx 0
0 fy cy 0
0 0 1 0
0 0 0 1