SceneVerse - Githubissues

@article{jia2024sceneverse, title={SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding}, author={Jia, Baoxiong and Chen, Yixin and Yu, Huangyue and Wang, Yan and Niu, Xuesong and Liu, Tengyu and Li, Qing and Huang, Siyuan}, journal={arXiv preprint arXiv:2401.09340}, year={2024} }

Provided Language Types

We list the available data in the current version of SceneVerse in the table below:

Dataset	Object Caption	Scene Caption	Ref-Annotation	Ref-Pairwise `rel2`	Ref-MultiObject `relm`	Ref-Star `star`	Ref-Chain (Optional) `chain`
ScanNet	✅	✅	ScanRefer Nr3D	✅	✅	✅	✅
MultiScan	✅	✅	✅	✅	✅	✅	✅
ARKitScenes	✅	✅	✅	✅	✅	✅	✅
HM3D	`template`	✅	✅	✅	✅	✅	✅
3RScan	✅	✅	❌	✅	✅	✅	✅
Structured3D	`template`	✅	❌	✅	✅	✅	❌
ProcTHOR	`template` \| ❌ \| ❌ \| `template` \| `template` \| `template`	❌

For the generated object referrals, we provide both the direct template-based generations template and the LLM-refined versions gpt. Please refer to our supplementary for the description of selected pair-wise / multi-object / star types. We also provide the chain type which contains language using obejct A to refer B and then B to refer the target object C. As we found the chain type could sometimes lead to unnatural descriptions, we did not discuss it in the main paper. Feel free to inspect and use it in your projects.

linukc / SUN3D_DATASETS

SceneVerse #24

Provided Language Types