scene-verse / SceneVerse

Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
https://scene-verse.github.io
MIT License
200 stars 3 forks source link

Questions on the usage of HM3D dataset #7

Closed chrockey closed 7 months ago

chrockey commented 7 months ago

Hi,

I have two questions regarding your use of HM3D for SceneVerse:

Thank you!

SergioArnaud commented 7 months ago

Doubling down on this question. Particularly on the part about Room Segmentation in HM3D.

@Buzz-Beater could you provide code or another way to obtain which part of a scene corresponds to one of your segmentations? We're currently planning to utilize the HM3D annotations from your dataset, we will not utilize the pointclouds provided, but the habitat simulator directly. Without knowing how the room segmentations are computed the data is not usable.

@chrockey I can help with the second question. I can recommend the following two options:

  1. You can use habitat directly; setup your env, donwload the scenes and use an agent interacting with the environment to collect trajectories, from your trajectories simply extract RGBD + K + poses. This is harder but gives more flexibility
  2. You can use Omnidata. Download the HM3D split and use the dataloader to sample frames along with Depth and poses. This will be a little easier to get running but less flexible IMO.

Thank you

chrockey commented 7 months ago

@SergioArnaud Thanks for the help!

I thought there were ground truth RGBD videos captured by humans since HM3D is real, not synthetic. Are RGBD videos of OmniData the original (ground truth) RGBD videos of HM3D? or just rendered video using a simulator?

SergioArnaud commented 7 months ago

Nope, they are not videos but 3D scans from real world environments. Omnidata will not return a video but a set of observations from this 3D scan

chrockey commented 7 months ago

Omnidata will not return a video but a set of observations from this 3D scan

Oh, I see. In that case, the set of observations from Omnidata does not necessarily cover the full 3D scene. It could be partial. Is my understanding correct?

SergioArnaud commented 7 months ago

That is correct. You can always oversample and get a ton of frames to get close to full coverage. Or do some fancy tricks on top of omnidata.

If you want full coverage, using an agent to do frontier exploration on habitat might be an easier path (if you're familiar with habitat and frontier exploration)

Also interested in knowing how the sceneverse authors did it. @Buzz-Beater

yixchen commented 7 months ago

Hi,

The room segmentation from HM3D can be found as follows.

  1. In the current released HM3D dataset, the scan_id is formatted by {scene_id}_{room_id}, e.g., {00006-HkseAnWCgqk}_{sub002}.
  2. The room_id can be found by
    # decompose scene mesh into subroom
    scene = trimesh.load(glb_dir, file_type='glb')
    room_dict = dict()
    for name, _g in scene.geometry.items():
        group_name = name.split('_')[2] # group_name is the room_id
        if group_name not in room_dict:
            room_dict[group_name] = []
        room_dict[group_name].append(_g)

Regarding the images/object captions in HM3D, we only released the templated-based object captions in the current version. One easy way to extract images is to use the habitat simulator, as @SergioArnaud also mentions.

chrockey commented 7 months ago

@SergioArnaud and @yixchen Thanks for the reply! I appreciate your help with this :)

SergioArnaud commented 7 months ago

@yixchen

Thank you so much for the answer! Do you have any pointers on how to use this constrained mesh information in habitat?

yixchen commented 7 months ago

I'm not sure how to find room_id in habitat, but one workaround you can try is to extract the (rough) layout/floor map information from the mesh file and use it to locate the objects in the simulator.

SergioArnaud commented 7 months ago

What I understand from the snippet of code is that you have ground truth annotations for HM3D, then you're using the HM3D semantics dataset, not HM3D. If that's the case you should also cite HM3D-semantics on the paper, that way is much easier to understand how you got the room annotations.

Thank you for the help @yixchen

yixchen commented 7 months ago

Yes, we use annotations from HM3D semantics. We will further clarify and add the citation in the revised version. Thanks.