SceneFun3D - Githubissues

linukc commented 3 months ago

We introduce SceneFun3D, the first large-scale dataset with geometrically fine-grained interaction annotations in 3D real-world indoor environments. We aim to encourage research on the following questions:

Where are the functionalities located in 3D indoor environments and what actions they afford?
What purpose do the functionalities serve in the scene context?
How to interact with the functional elements?

SceneFun3D contains more than 14.8k annotations of functional interactive elements in 710 high-fidelity reconstructions of indoor environments. These annotations comprise a 3D instance mask followed by an affordance label. We define nine affordance categories to describe interactions afforded by common scene functionalities.

Beyond localizing the functionalities, it is crucial to understand the purpose that they serve in the scene context. To this end, we collect free-form diverse language descriptions of tasks that involve interacting with the scene functionalities. To achieve holistic scene understanding, we collect annotations of the motions required to manipulate the interactive elements.

Paper Project Code

@inproceedings{delitzas2024scenefun3d, 
  title = {{SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes}}, 
  author = {Delitzas, Alexandros and Takmaz, Ayca and Tombari, Federico and Sumner, Robert and Pollefeys, Marc and Engelmann, Francis}, 
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
  year = {2024}
}

linukc commented 3 months ago

Novel task formulations:

1) Functionality segmentation - Given a 3D point cloud of a scene, the goal is to segment the functional interactive element instances and predict the associated affordance labels. 2) Task-driven affordance grounding - Given a language task description (e.g., “open the fridge”), the goal is to predict the instance mask of the functional element that we need to interact with and the label of the action it affords. 3) 3D motion estimation - In addition to segmenting the functionalities, the goal is to infer the motion parameters which describe how an agent can interact with the predicted functionalities.

linukc commented 3 months ago

Data and code https://opensun3d.github.io/cvpr24-challenge/track_2/

linukc / SUN3D_DATASETS

SceneFun3D #25

Novel task formulations: