Monocular Occupancy Prediction for Scalable Indoor Scenes

[**Hongxiao Yu**](https://orcid.org/0009-0003-9249-2726)^1,2 · [**Yuqi Wang**](https://orcid.org/0000-0002-6360-1431)^1,2 · [**Yuntao Chen**](https://orcid.org/0000-0002-9555-1897)³ · [**Zhaoxiang Zhang**](https://orcid.org/0000-0003-2648-3875)^1,2,3 ¹School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS) ²NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA) ³Centre for Artificial Intelligence and Robotics (HKISI_CAS) **ECCV 2024** [![Static Badge](https://img.shields.io/badge/arXiv-2407.11730-red)](https://arxiv.org/abs/2407.11730) [![Static Badge](https://img.shields.io/badge/Project%20Page-ISO-blue)](https://hongxiaoy.github.io/ISO) [![Static Badge](https://img.shields.io/badge/Demo-Hugging%20Face-yellow)](https://huggingface.co/spaces/hongxiaoy/ISO) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/monocular-occupancy-prediction-for-scalable/3d-semantic-scene-completion-from-a-single)](https://paperswithcode.com/sota/3d-semantic-scene-completion-from-a-single?p=monocular-occupancy-prediction-for-scalable)

Performance

Here we compare our ISO with the previously best NDC-Scene and MonoScene model.

Method	IoU	ceiling	floor	wall	window	chair	bed	sofa	table	tvs	furniture	object	mIoU
MonoScene	42.51	8.89	93.50	12.06	12.57	13.72	48.19	36.11	15.13	15.22	27.96	12.94	26.94
NDC-Scene	44.17	12.02	93.51	13.11	13.77	15.83	49.57	39.87	17.17	24.57	31.00	14.96	29.03
Ours	47.11	14.21	93.47	15.89	15.14	18.35	50.01	40.82	18.25	25.90	34.08	17.67	31.25

We highlight the best results in bold.

Pretrained models on NYUv2 can be downloaded here.

Preparing ISO

Installation

Create conda environment:

$ conda create -n iso python=3.9 -y
$ conda activate iso

This code was implemented with python 3.9, pytorch 2.0.0 and CUDA 11.7. Please install PyTorch:

$ conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Install the additional dependencies:

$ git clone --recursive https://github.com/hongxiaoy/ISO.git
$ cd ISO/
$ pip install -r requirements.txt

:bulb:Note

Change L140 in depth_anything/metric_depth/zoedepth/models/base_models/dpt_dinov2/dpt.py to

self.pretrained = torch.hub.load('facebookresearch/dinov2', 'dinov2_{:}14'.format(encoder), pretrained=False)

Then, download Depth-Anything pre-trained model and metric depth model checkpoints file to checkpoints/.

Install tbb:

$ conda install -c bioconda tbb=2020.2

Finally, install ISO:

$ pip install -e ./

:bulb:Note

If you move the ISO dir to another place, you should run

pip cache purge

then run pip install -e ./ again.

Datasets

NYUv2

Download the NYUv2 dataset.
Create a folder to store NYUv2 preprocess data at /path/to/NYU/preprocess/folder.
Store paths in environment variables for faster access:
```
$ export NYU_PREPROCESS=/path/to/NYU/preprocess/folder
$ export NYU_ROOT=/path/to/NYU/depthbin 
```
:bulb:Note

Recommend using

echo "export NYU_PREPROCESS=/path/to/NYU/preprocess/folder" >> ~/.bashrc

format command for future convenience.
Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:
```
$ cd ISO/
$ python iso/data/NYU/preprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS
```

Occ-ScanNet

Download the Occ-ScanNet dataset, this include：
- posed_images
- gathered_data
- train_subscenes.txt
- val_subscenes.txt
Create a root folder to store Occ-ScanNet dataset /path/to/Occ/ScanNet/folder, and move the all dataset files to this folder, zip files need extraction.
Store paths in environment variables for faster access:
```
$ export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder
```
:bulb:Note

Recommend using

echo "export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder" >> ~/.bashrc

format command for future convenience.

Pretrained Models

Download ISO pretrained models on NYUv2, then put them in the folder /path/to/ISO/trained_models.

huggingface-cli download --repo-type model hongxiaoy/ISO

If you didn't install huggingface-cli before, please following official instructions.

Running ISO

Training

NYUv2

Create folders to store training logs at /path/to/NYU/logdir.
Store in an environment variable:

$ export NYU_LOG=/path/to/NYU/logdir

Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on NYUv2:

$ cd ISO/
$ python iso/scripts/train_iso.py \
dataset=NYU \
NYU_root=$NYU_ROOT \
NYU_preprocess_root=$NYU_PREPROCESS \
logdir=$NYU_LOG \
n_gpus=2 batch_size=4

Occ-ScanNet

Create folders to store training logs at /path/to/OccScanNet/logdir.
Store in an environment variable:

$ export OCC_SCANNET_LOG=/path/to/OccScanNet/logdir

Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on Occ-ScanNet (should match config file name in train_iso.py):

$ cd ISO/
$ python iso/scripts/train_iso.py \
dataset=OccScanNet \
OccScanNet_root=$OCC_SCANNET_ROOT \
logdir=$OCC_SCANNET_LOG \
n_gpus=2 batch_size=4

Evaluating

NYUv2

To evaluate ISO on NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/eval_iso.py \
    dataset=NYU \
    NYU_root=$NYU_ROOT\
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Inference

Please create folder /path/to/iso/output to store the ISO outputs and store in environment variable:

export ISO_OUTPUT=/path/to/iso/output

NYUv2

To generate the predictions on the NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/generate_output.py \
    +output_path=$ISO_OUTPUT \
    dataset=NYU \
    NYU_root=$NYU_ROOT \
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Visualization

You need to create a new Anaconda environment for visualization.

conda create -n mayavi_vis python=3.7 -y
conda activate mayavi_vis
pip install omegaconf hydra-core PyQt5 mayavi

If you meet some problem when installing mayavi, please refer to the following instructions:

Official mayavi installation instruction

NYUv2

$ cd ISO/
$ python iso/scripts/visualization/NYU_vis_pred.py +file=/path/to/output/file.pkl

Aknowledgement

This project is built based on MonoScene. Please refer to (https://github.com/astra-vision/MonoScene) for more documentations and details.

We would like to thank the creators, maintainers, and contributors of the MonoScene, NDC-Scene, ZoeDepth, Depth Anything for their invaluable work. Their dedication and open-source spirit have been instrumental in our development.

Citation

@article{yu2024monocular,
  title={Monocular Occupancy Prediction for Scalable Indoor Scenes},
  author={Yu, Hongxiao and Wang, Yuqi and Chen, Yuntao and Zhang, Zhaoxiang},
  journal={arXiv preprint arXiv:2407.11730},
  year={2024}
}

hongxiaoy / ISO

readme

Monocular Occupancy Prediction for Scalable Indoor Scenes

Performance

Preparing ISO

Installation

Datasets

NYUv2

Occ-ScanNet

Pretrained Models

Running ISO

Training

NYUv2

Occ-ScanNet

Evaluating

NYUv2

Inference

NYUv2

Visualization

NYUv2

Aknowledgement

Citation