Open linukc opened 3 months ago
1) Functionality segmentation - Given a 3D point cloud of a scene, the goal is to segment the functional interactive element instances and predict the associated affordance labels. 2) Task-driven affordance grounding - Given a language task description (e.g., “open the fridge”), the goal is to predict the instance mask of the functional element that we need to interact with and the label of the action it affords. 3) 3D motion estimation - In addition to segmenting the functionalities, the goal is to infer the motion parameters which describe how an agent can interact with the predicted functionalities.
Data and code https://opensun3d.github.io/cvpr24-challenge/track_2/
We introduce SceneFun3D, the first large-scale dataset with geometrically fine-grained interaction annotations in 3D real-world indoor environments. We aim to encourage research on the following questions:
SceneFun3D contains more than 14.8k annotations of functional interactive elements in 710 high-fidelity reconstructions of indoor environments. These annotations comprise a 3D instance mask followed by an affordance label. We define nine affordance categories to describe interactions afforded by common scene functionalities.
Beyond localizing the functionalities, it is crucial to understand the purpose that they serve in the scene context. To this end, we collect free-form diverse language descriptions of tasks that involve interacting with the scene functionalities. To achieve holistic scene understanding, we collect annotations of the motions required to manipulate the interactive elements.
Paper Project Code