Open phelps-matthew opened 2 years ago
Hi, thanks for the question.
If you want to use your own camera poses, you will have to process them our NSVF-based format, which is fairly simple anyway (see below). Other than proc_colmap there is also proc_record3d.py
which processes captures from the iPhone app Record3D to our format; this might be a helpful example.
Currently svox2 itself only supports the pinhole model fx/fy/cx/cy.
The run_colmap.py
script (called by proc_colmap.sh
) actually estimates radial distortion parameters by default with COLMAP but will undistort the images. For simplicity, you can also use OpenCV to undistort your own images.
intrinsics.txt
: 4x4 matrix,
fx 0 0 cx
0 fy 0 cy
0 0 1 0
0 0 0 1
images/
or rgb/
: images (*.png or *.jpg)
pose/
: 4x4 c2w pose matrix for each image (*.txt), OpenCV convention
Thank you kindly! I may try undistorting all my images, though the distortion coefficients are very small here, so I'm going to ignore them for the moment.
I was able to get the nsvf dataset loader working after formatting my images and poses to the following convention (had to add in a conversion from grayscale to rgb)
<dataset_name>
|-- bbox.txt # bounding-box file
|-- intrinsics.txt # 4x4 camera intrinsics
|-- images
|-- 0_000001.png # target image for each view
...
|-- 1_000001.png
...
|-- pose
|-- 0_000001.txt # camera pose for each view (4x4 matrices)
...
|-- 1_000001.txt
...
I'll continue training and testing, granted there are quite a number of hyperparamers to adjust here, but hoping I can start to see the rough formation of my imaged object.
Do you know what convention rotation matrices are to follow for NSVF? Having a difficult time determining if my axes are aligned with its standard. For example, here is my distribution of camera poses
In case someone else will find this helpful.. I believe COLMAP follows the format of the projection matrix given as transforming 3D camera coordinates to world coordinates. Hence, to go from the above image as formed from W -> C transformation to this image,
try the following:
# Given 3x3 W -> C SO(3) matrix and r, the translation vector, form the correct 4x4 transformation matrix
# X_w = R^T X_c - R^T t (cam to world, what was needed)
# X_c = R X_w + t (world to cam, what I had before)
Rt = np.matmul(so3.transpose(), r)
trans = np.vstack((np.hstack((so3.transpose(), -Rt.reshape(-1, 1))), [0, 0, 0, 1]))
You can then view using python view_data.py <data_root>
. All one needs is images, poses, and intrinsics that follow the above format (no bbox.txt or other files strictly needed).
images/
orrgb/
: images (.png or .jpg)pose/
: 4x4 c2w pose matrix for each image (*.txt), OpenCV convention
Apologies, I totally missed this remark! Would have saved myself a headache 😂
how can i use views with different intrinsics (images captured by multi-cameras)?
what does
I am little bit confused about the intrinsic matrix, shouldn't it be like this?
fx 0 cx 0
0 fy cy 0
0 0 1 0
0 0 0 1
I have a large dataset comprising renders of a single object taken over a fairly dense sampling of poses (rotations and translations). I also have the camera intrinsics and distortion coefficients (though it looks like these are usually not incorporated in most radiance field work?).
I was hoping you might be able to lend some guidance on how I can use this supplemental information to form a dataset that is compatible with svox2. Specifically, do you have any tips on how I might leverage
colmap
andcolmap2nsvf.py
? When runningproc_colmap.sh
on a directory of raw images, I see it produces its own pose.txt estimates, database.db, and points.npy and appears to sample only a subset of the given images. Are there any modifications I should be making that are immediately evident to you?Any help is greatly appreciated!