scene-verse / SceneVerse

Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
https://scene-verse.github.io
MIT License
175 stars 2 forks source link

The view-dependent object relationships determination #6

Closed ZCMax closed 2 weeks ago

ZCMax commented 5 months ago

As the paper state: Horizontal relationships describe the proximity relations like in front of, next to, behind, etc. Relationships like left, right are contextually dependent on a reference view, where another anchor object is utilized to establish the view direction. The distance between the two objects is also calculated to describe whether the objects are far or near in space. I can understand that for left and right object relationship, it's needed to provide the reference view. My question is that does the object relationship like in front of and behind not rely on the reference view?

yixchen commented 5 months ago

Hi,

The in front of and behind are related to the reference view in our relationship determination. Imagine one object is in front of another object under one view direction, their relationship could be totally different, i.e., behind, if you see them from the opposite direction (another side of the room). While this may not always be true, as such relationships are also related to the object orientation (e.g., in front of the TV), we treat them as view-dependent in the currently released data.

ZCMax commented 5 months ago

Thanks for you explaination, here is my understanding:

  1. During object referral generation, the object's own orientation is not considered, for example: a TV itself has the front side and back side. You only determine the spatial relationship from the reference view.
  2. Since you said that the reference view is provided in the relationship determination, I find that many prompts in the annotation files do not include the reference view information, such as {"item_id": "45261728_ref-chain-gpt_00156757", "scan_id": "45261728", "target_id": "4", "instance_type": "chair", "utterance": "The chair can be found in front of the lower cabinet, situated beneath its taller counterpart."}. It seems that reference view is not included in the prompt.
yixchen commented 5 months ago
  1. Yes.
  2. The default view direction is along the +y axis (z-up) in the aligned scene point clouds that we provide in SceneVerse.
ZCMax commented 4 months ago

After checking the annotations in scannet referal dataset, I found the for the reference view information, it demonstrates like "facing the bed". However, how is such statement generated? Or what is the definitation of "facing"? Since I'm working on the egocentric perception task, so I want to know more details about the possible observation information. thanks

yixchen commented 1 month ago

Hi, we have released the scripts for scene graph generation. See here. Feedback and PRs are welcome.

Buzz-Beater commented 2 weeks ago

Closing this issue as we have already provided related codes and instructions, feel free to re-open this thread if there is any issue :-)