The two popular datasets ScanRefer and ReferIt3D connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects, across 84k natural referential sentences, covering 705 real-world scenes.
@article{abdelreheem2022scanents,
author = {Abdelreheem, Ahmed and Olszewski, Kyle and Lee, Hsin-Ying and Wonka, Peter and Achlioptas, Panos},
title = {ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes},
journal = Computing Research Repository (CoRR),
volume = {abs/2212.06250},
year = {2022}
}
The two popular datasets ScanRefer and ReferIt3D connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects, across 84k natural referential sentences, covering 705 real-world scenes.
Paper Project Code (Coming Soon)