Open sghong977 opened 1 year ago
Steps of Image-based 3D reconstruction from images
Automatic Reconstruction tool
Structure-from-Motion (SfM)
Multi-View Stereo (MVS)
Terminology
camera: is associated with the physical object of a camera using the same zoom-factor and lens. A camera defines the intrinsic projection model in COLMAP. A single camera can take multiple images with the same resolution, intrinsic parameters, and distortion characteristics. -
image is associated with a bitmap file, e.g., a JPEG or PNG file on disk.
COLMAP detects keypoints in each image whose appearance is described by numerical descriptors.
Pure appearance-based correspondences between keypoints/descriptors are defined by matches
while inlier matches are geometrically verified and used for the reconstruction procedure.
Step 1. feature extracting. (using SIFT) Step 2. feature matching. (exhaustive matching, sequential ... ) Step 3. Sparse Reconstruction.
sparse representation of the scene and the camera poses
sparse contains cameras.txt, images.txt, points3D.txt
cameras.txt: intrinsic. one line of data per camera:
images: 2 lines per image
3D point list with one line of data per point:
Step 4. Dense Reconstruction.
Output format of sparse and dense reconstruction are explained here https://colmap.github.io/format.html.
Database
In the preprocessing step of NeuMan, there are 6 command lines using COLMAP. My goal is just understanding these commands:
colmap feature_extractor --database_path ./recon/db.db --image_path ./raw_720p --ImageReader.mask_path ./raw_masks --SiftExtraction.estimate_affine_shape=true --SiftExtraction.domain_size_pool=true --ImageReader.camera_model SIMPLE_RADIAL --ImageReader.single_camera 1
colmap exhaustive_matcher --database_path ./recon/db.db --SiftMatching.guided_matching=true
mkdir -p ./recon/sparse
colmap mapper --database_path ./recon/db.db --image_path ./raw_720p --output_path ./recon/sparse
if [ -d "./recon/sparse/1" ]; then echo "Bad reconstruction"; exit 1; else echo "Ok"; fi
mkdir -p ./recon/dense
colmap image_undistorter --image_path raw_720p --input_path ./recon/sparse/0/ --output_path ./recon/dense
colmap patch_match_stereo --workspace_path ./recon/dense
colmap model_converter --input_path ./recon/dense/sparse/ --output_path ./recon/dense/sparse --output_type=TXT
I'm gonna replace COLMAP feature extraction & matching to a modern feature algorithm, LoFTR.
project page: https://zju3dv.github.io/loftr/
What LoFTR do? Coarse level, dense matches -> refine feature at a fine level, instead of traditional steps of feature detection -> extraction -> matching.
How does the model work?
LoFTR is a transformer-based model, and the output is feature descriptors.
using self, cross-attention layer that conditioned on both images
2 feature maps for each image
calculate confidence matrix of feature pairs (각 이미지의 피쳐맵에서 이 좌표와 저 좌표가 얼마나 corresponding한지. 이미지의 1/8 사이즈로 계산함)
fine도 마찬가지.. correlation-based match라는데.. 암튼 쿼리 포인트 i에 대해서 가장 correspondence가 높은게 뭔지 heatmap이 있고, 그 softmax값 높은거랑 매치된다고 하는듯
is it supervised learning? how can the model generate fine descriptor -> ㅇㅇ. loss는 coarse, fine둘다 있는데 coarse에는 depth map같은 정보를 쓰고, fine에는 키포인트 명시된 데이터를 쓰는 듯. MegaDepth이런것도 COLMAP의 sparse reconstruction dataset 줌
Then how can I use this model? Does it provide pretrained model?
let's make neuman docker container and execute following commands: pip install kornia pip install kornia_moons pip install opencv-python --upgrade
wget https://github.com/kornia/data/raw/main/matching/kn_church-2.jpg wget https://github.com/kornia/data/raw/main/matching/kn_church-8.jpg
feature_custom.py 작성한거 참고 ㄱ
TO-DO