hongxiaoy / ISO

[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes
https://hongxiaoy.github.io/ISO
Apache License 2.0
26 stars 1 forks source link
computer-vision deep-learning eccv2024 eccv24 indoor-occupancy nyu-depth-v2 pytorch semantic-scene-completion semantic-scene-understanding

Monocular Occupancy Prediction for Scalable Indoor Scenes

[**Hongxiao Yu**](https://orcid.org/0009-0003-9249-2726)1,2 · [**Yuqi Wang**](https://orcid.org/0000-0002-6360-1431)1,2 · [**Yuntao Chen**](https://orcid.org/0000-0002-9555-1897)3 · [**Zhaoxiang Zhang**](https://orcid.org/0000-0003-2648-3875)1,2,3 1School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) 2NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA) 3Centre for Artificial Intelligence and Robotics (HKISI_CAS) **ECCV 2024** [![Static Badge](https://img.shields.io/badge/arXiv-2407.11730-red)](https://arxiv.org/abs/2407.11730) [![Static Badge](https://img.shields.io/badge/Project%20Page-ISO-blue)](https://hongxiaoy.github.io/ISO) [![Static Badge](https://img.shields.io/badge/Demo-Hugging%20Face-yellow)](https://huggingface.co/spaces/hongxiaoy/ISO) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/monocular-occupancy-prediction-for-scalable/3d-semantic-scene-completion-from-a-single)](https://paperswithcode.com/sota/3d-semantic-scene-completion-from-a-single?p=monocular-occupancy-prediction-for-scalable)

Performance

Here we compare our ISO with the previously best NDC-Scene and MonoScene model.

Method IoU ceiling floor wall window chair bed sofa table tvs furniture object mIoU
MonoScene 42.51 8.89 93.50 12.06 12.57 13.72 48.19 36.11 15.13 15.22 27.96 12.94 26.94
NDC-Scene 44.17 12.02 93.51 13.11 13.77 15.83 49.57 39.87 17.17 24.57 31.00 14.96 29.03
Ours 47.11 14.21 93.47 15.89 15.14 18.35 50.01 40.82 18.25 25.90 34.08 17.67 31.25

We highlight the best results in bold.

Pretrained models on NYUv2 can be downloaded here.

Preparing ISO

Installation

  1. Create conda environment:
$ conda create -n iso python=3.9 -y
$ conda activate iso
  1. This code was implemented with python 3.9, pytorch 2.0.0 and CUDA 11.7. Please install PyTorch:
$ conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Install the additional dependencies:
$ git clone --recursive https://github.com/hongxiaoy/ISO.git
$ cd ISO/
$ pip install -r requirements.txt

:bulb:Note

Change L140 in depth_anything/metric_depth/zoedepth/models/base_models/dpt_dinov2/dpt.py to

self.pretrained = torch.hub.load('facebookresearch/dinov2', 'dinov2_{:}14'.format(encoder), pretrained=False)

Then, download Depth-Anything pre-trained model and metric depth model checkpoints file to checkpoints/.

  1. Install tbb:
$ conda install -c bioconda tbb=2020.2
  1. Finally, install ISO:
$ pip install -e ./

:bulb:Note

If you move the ISO dir to another place, you should run

pip cache purge

then run pip install -e ./ again.

Datasets

NYUv2

  1. Download the NYUv2 dataset.

  2. Create a folder to store NYUv2 preprocess data at /path/to/NYU/preprocess/folder.

  3. Store paths in environment variables for faster access:

    $ export NYU_PREPROCESS=/path/to/NYU/preprocess/folder
    $ export NYU_ROOT=/path/to/NYU/depthbin 

    :bulb:Note

    Recommend using

    echo "export NYU_PREPROCESS=/path/to/NYU/preprocess/folder" >> ~/.bashrc

    format command for future convenience.

  4. Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:

    $ cd ISO/
    $ python iso/data/NYU/preprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS

Occ-ScanNet

  1. Download the Occ-ScanNet dataset, this include:

    • posed_images
    • gathered_data
    • train_subscenes.txt
    • val_subscenes.txt
  2. Create a root folder to store Occ-ScanNet dataset /path/to/Occ/ScanNet/folder, and move the all dataset files to this folder, zip files need extraction.

  3. Store paths in environment variables for faster access:

    $ export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder

    :bulb:Note

    Recommend using

    echo "export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder" >> ~/.bashrc

    format command for future convenience.

Pretrained Models

Download ISO pretrained models on NYUv2, then put them in the folder /path/to/ISO/trained_models.

huggingface-cli download --repo-type model hongxiaoy/ISO

If you didn't install huggingface-cli before, please following official instructions.

Running ISO

Training

NYUv2

  1. Create folders to store training logs at /path/to/NYU/logdir.

  2. Store in an environment variable:

$ export NYU_LOG=/path/to/NYU/logdir
  1. Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on NYUv2:
    $ cd ISO/
    $ python iso/scripts/train_iso.py \
    dataset=NYU \
    NYU_root=$NYU_ROOT \
    NYU_preprocess_root=$NYU_PREPROCESS \
    logdir=$NYU_LOG \
    n_gpus=2 batch_size=4

Occ-ScanNet

  1. Create folders to store training logs at /path/to/OccScanNet/logdir.

  2. Store in an environment variable:

$ export OCC_SCANNET_LOG=/path/to/OccScanNet/logdir
  1. Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on Occ-ScanNet (should match config file name in train_iso.py):
    $ cd ISO/
    $ python iso/scripts/train_iso.py \
    dataset=OccScanNet \
    OccScanNet_root=$OCC_SCANNET_ROOT \
    logdir=$OCC_SCANNET_LOG \
    n_gpus=2 batch_size=4

Evaluating

NYUv2

To evaluate ISO on NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/eval_iso.py \
    dataset=NYU \
    NYU_root=$NYU_ROOT\
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Inference

Please create folder /path/to/iso/output to store the ISO outputs and store in environment variable:

export ISO_OUTPUT=/path/to/iso/output

NYUv2

To generate the predictions on the NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/generate_output.py \
    +output_path=$ISO_OUTPUT \
    dataset=NYU \
    NYU_root=$NYU_ROOT \
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Visualization

You need to create a new Anaconda environment for visualization.

conda create -n mayavi_vis python=3.7 -y
conda activate mayavi_vis
pip install omegaconf hydra-core PyQt5 mayavi

If you meet some problem when installing mayavi, please refer to the following instructions:

NYUv2

$ cd ISO/
$ python iso/scripts/visualization/NYU_vis_pred.py +file=/path/to/output/file.pkl

Aknowledgement

This project is built based on MonoScene. Please refer to (https://github.com/astra-vision/MonoScene) for more documentations and details.

We would like to thank the creators, maintainers, and contributors of the MonoScene, NDC-Scene, ZoeDepth, Depth Anything for their invaluable work. Their dedication and open-source spirit have been instrumental in our development.

Citation

@article{yu2024monocular,
  title={Monocular Occupancy Prediction for Scalable Indoor Scenes},
  author={Yu, Hongxiao and Wang, Yuqi and Chen, Yuntao and Zhang, Zhaoxiang},
  journal={arXiv preprint arXiv:2407.11730},
  year={2024}
}