lyclyc52 / SANeRF-HQ

[CVPR2024] SANeRF-HQ: Segment Anything for NeRF in High Quality.
Apache License 2.0
30 stars 2 forks source link

SANeRF-HQ

arXiv

SANeRF-HQ: Segment Anything for NeRF in High Quality [CVPR 2024].

This is the official implementation of SANeRF-HQ.

SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu, Benran Hu, Yu-Wing Tai, Chi-Keung Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

SANeRF-HQ Model Architecture

Set up

The code is based on this repo.

First, install requirement packages

pip install -r requirements.txt

Then, go to the HQ-SAM repo and install it

pip install segment-anything-hq

Also, you can build the extension (optional)

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
cd raymarching
python setup.py build_ext --inplace # build ext only, do not install (only can be used in the parent directory)
pip install . # install to python path (you still need the raymarching/ folder, since this only install the built extension.)

Dataset

We use the dataset from Mip-NeRF 360, LERF, LLFF, 3DFRONT, Panoptic Lifting and Contrastive Lift. You can download the dataset from their website by clicking the following hyperlinks. Also we provide one example here.

To switch different dataset, simply change the value of the flag --data_type during training.

For the evaluation masks we selected, you can download them here. Some datasets have ground truth segmentation (e.g. 3D-FRONT and Panoptic Lifting) so we directly use their annotation. For those without ground truth segmentation (e.g. Mip-NeRF 360), we randomly select some views and use this to obtain masks. Then, we pass the masks through CascadePSP for refinement if necessary.

Training

We provide some sample scripts to use our code. For the detailed description of each arguments, please refer to our code.

Evaluation

To evaluate our results, you can run scripts/test_obj_nerf.sh. You can add --use_default_intrinsics in the test script to render mask with the default intrinsics. You can be download the evaluation views here

Other Results

In our paper, we demonstrate the potential of our pipeline to achieve various segmentation tasks. Here are some instructions about how we get those results.

Text-prompt Segmentation

We use Grounding-DINO to generate the bounding box based on text and then use the bounding box as prompt for SAM to generate mask.

Auto-segmentation and Dynamic Segmentation

We use DEVA for a sequence of images in video.

For static scene, you can first render a video from NeRF. You can utilize the 'save trajectory' function in GUI to store a sequence of camera poses. Click start track to start recoding the camera trajectory and click save trajectory to store it. Then put those frames into DEVA to help you obtain automatic segmentation results. Finally, you can use the code to train the object field. Remember to change --n_inst in multi-instance cases

Acknowledgement

Citation

If you find this repo or our paper useful, please :star: this repository and consider citing :pencil::

@article{liu2023sanerf,
  title={SANeRF-HQ: Segment Anything for NeRF in High Quality},
  author={Liu, Yichen and Hu, Benran and Tang, Chi-Keung and Tai, Yu-Wing},
  journal={arXiv preprint arXiv:2312.01531},
  year={2023}
}