ViP-DeepLab

Introduction

In this repository, we present the datasets and the toolkits of ViP-DeepLab. ViP-DeepLab is a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective image sequences while providing each point with instance-level semantic interpretations. Solving this problem requires the vision models to predict the spatial location, semantic class, and temporally consistent instance label for each 3D point. ViP-DeepLab approaches it by jointly performing monocular depth estimation and video panoptic segmentation. We name this joint task as Depth-aware Video Panoptic Segmentation (DVPS), and propose a new evaluation metric along with two derived datasets for it. This repository includes the datasets SemKITTI-DVPS and Cityscapes-DVPS along with the evaluation toolkits.

Datasets

SemKITTI-DVPS

SemKITTI-DVPS is derived from SemanticKITTI dataset. SemanticKITTI dataset is based on the odometry dataset of the KITTI Vision benchmark. SemanticKITTI dataset provides perspective images and panoptic-labeled 3D point clouds. To convert it for DVPS, we project the 3D point clouds onto the image plane and name the derived dataset as SemKITTI-DVPS. SemKITTI-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license. The dataset and the evaluation toolkit are in the folder semkitti-dvps.

SemKITTI-DVPS example.

Cityscapes-DVPS

Cityscapes-DVPS is derived from Cityscapes-VPS by adding re-computed depth maps from Cityscapes dataset. Cityscapes-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license. The dataset and the evaluation toolkit are in the folder cityscapes-dvps.

Cityscapes-DVPS example.

Citation

If you use the datasets in your research, please cite our project.

@article{vip_deeplab,
  title={ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation},
  author={Siyuan Qiao and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
  journal={arXiv preprint arXiv:2012.05258},
  year={2020}
}

joe-siyuan-qiao / ViP-DeepLab

readme