We augment ScanRefer to create a dataset with 3 types of description-scene pairs: a) Zero Target; b) Single Target; and c) Multiple Targets, indicating zero, single, or multiple target objects in the scene match the description. In addition, we use ChatGPT to augment the descriptions so they are more natural and diverse. To ensure the dataset is of high quality we manually verify all generated samples. We obtain a dataset with 61926 descriptions in total.
@misc{zhang2023multi3drefergroundingtextdescription,
title={Multi3DRefer: Grounding Text Description to Multiple 3D Objects},
author={Yiming Zhang and ZeMing Gong and Angel X. Chang},
year={2023},
eprint={2309.05251},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2309.05251},
}
We augment ScanRefer to create a dataset with 3 types of description-scene pairs: a) Zero Target; b) Single Target; and c) Multiple Targets, indicating zero, single, or multiple target objects in the scene match the description. In addition, we use ChatGPT to augment the descriptions so they are more natural and diverse. To ensure the dataset is of high quality we manually verify all generated samples. We obtain a dataset with 61926 descriptions in total.
Paper Project Code