Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee
FreeSplat is a generalizable 3DGS method for indoor scene reconstruction, which leverages low-cost 2D backbones for feature extraction and cost volume for multi-view aggregation. Furthermore, FreeSplat proposes a Pixel-wise Triplet Fusion (PTF) module to merge multi-view 3D Gaussians, such that to remove those redundant ones and provide point-level latent fusion and regularization on Gaussian localization. FreeSplat shows consistent quality and efficiency improvements especially when given large numbers of input views.
To get started, create a virtual environment using Python 3.10+:
git clone https://github.com/wangys16/FreeSplat.git
cd FreeSplat
conda create -n freesplat python=3.10
conda activate freesplat
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
If your system does not use CUDA 12.1 by default, see the troubleshooting tips below from pixelSplat.
FreeSplat is trained using about 100 scenes from ScanNet following NeRFusion and SurfelNeRF, and evaluated on ScanNet and Replica datasets.
You can download our preprocessed datasets here. The downloaded datasets under path datasets/
should look like:
datasets
├─ scannet
│ ├─ train
│ ├ ├─scene0005_00
| ├ ├ ├─ color (RGB images)
│ ├ ├ ├─ depth (depth images)
│ ├ ├ ├─ intrinsic (intrinsics)
│ ├ ├ └─ extrinsics.npy (camera extrinsics)
│ ├ ├─ scene0020_00
│ ├ ...
│ ├─ test
│ ├ ├─
│ ├ ...
│ ├─ train_idx.txt (training scenes list)
│ └─ test_idx.txt (testing scenes list)
├─ replica
│ ├─ test
│ └─ test_idx.txt (testing scenes list)
Our sampled views for evaluation on different settings are in assets/evaluation_index_{dataset}_{N}views.json
.
You can find our pre-trained checkpoints here and download them to path checkpoints/
.
The main entry point is src/main.py
. To train FreeSplat on 2-views, 3-views, and FVT settings, you can respectively call:
python -m src.main +experiment=scannet/2views +output_dir=train_2views
python -m src.main +experiment=scannet/3views +output_dir=train_3views
python -m src.main +experiment=scannet/fvt +output_dir=train_fvt
The output will be saved in path outputs/***
.
To evaluate pre-trained model on the [N]
-views setting on [DATASET]
, you can call:
python -m src.main +experiment=[DATASET]/[SETTING] +output_dir=[OUTPUT_PATH] mode=test dataset/view_sampler=evaluation checkpointing.load=[PATH_TO_CHECKPOINT] dataset.view_sampler.num_context_views=[N]
For example, to evaluate 2-views trained FreeSplat:
python -m src.main +experiment=scannet/2views +output_dir=test_scannet_2views mode=test dataset/view_sampler=evaluation checkpointing.load=checkpoints/2views.ckpt dataset.view_sampler.num_context_views=2
To evaluate FreeSplat-fvt on ScanNet 10-views setting, you can run:
python -m src.main +experiment=scannet/fvt +output_dir=test_scannet_fvt mode=test dataset/view_sampler=evaluation checkpointing.load=checkpoints/fvt.ckpt dataset.view_sampler.num_context_views=10 model.encoder.num_views=9
Here model.encoder.num_views=9
is to use more nearby views for more accurate depth estimation. We also provide a whole scene reconstruction example that you can possibly run by:
python -m src.main +experiment=scannet/fvt +output_dir=test_scannet_whole mode=test dataset/view_sampler=evaluation checkpointing.load=checkpoints/fvt.ckpt dataset.view_sampler.num_context_views=30 model.encoder.num_views=30
If you find our work helpful, please consider citing our paper. Thank you!
@article{wang2024freesplat,
title={FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes},
author={Wang, Yunsong and Huang, Tianxin and Chen, Hanlin and Lee, Gim Hee},
journal={arXiv preprint arXiv:2405.17958},
year={2024}
}
Our code is largely based on pixelSplat, and our implementation also referred to SimpleRecon and MVSplat. Thanks for their great works!
This work is supported by the Agency for Science, Technology and Research (A*STAR) under its MTC Programmatic Funds (Grant No. M23L7b0021).