We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects.
@inproceedings{chen2021scan2cap,
title={Scan2cap: Context-aware dense captioning in rgb-d scans},
author={Chen, Zhenyu and Gholami, Ali and Nie{\ss}ner, Matthias and Chang, Angel X},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={3193--3203},
year={2021}
}
We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects.
Paper Project Code