hovsg / HOV-SG

[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"
https://hovsg.github.io
MIT License
188 stars 13 forks source link

Questions about the HM3D poses #19

Closed Eku127 closed 2 weeks ago

Eku127 commented 3 weeks ago

“We provide a script (hovsg/data/hm3dsem/gen_hm3dsem_walks_from_poses.py) that turns a set of camera poses (hovsg/data/hm3dsem/metadata/poses) into a sequence of RGB-D observations using the habitat-sim simulator.”

Hi, I have a question regarding the dataset. Could you please clarify how the camera poses in the HM3D scenes were obtained? Was there any prior work that used the same pose sequence?

Additionally, if there is any code available to generate these poses, it would be extremely helpful. Thanks in advance!

Tom-Huang commented 2 weeks ago

Hi @Eku127 , thank you for your interest in our project.

Could you please clarify how the camera poses in the HM3D scenes were obtained?

We implemented a keyboard controller in the habitat-sim and controlled an agent to walk through scenes from HM3DSem. The observations (RGB, depth, camera pose) during the exploration are collected and saved.

Was there any prior work that used the same pose sequence?

The dataset used in HOV-SG project was not used in other projects before. Since the HM3DSem dataset doesn't provide any labels for floors and rooms. We manually labeled 8 scenes for experiments in the paper. If you like, you can also reuse our dataset in your future work for consistent comparison.

Additionally, if there is any code available to generate these poses, it would be extremely helpful

You can look at the code from the VLMaps project as a reference to collect your own custom dataset. The code provides basic functions to move agent around (move forward, turn left, turn right) but didn't include functions to move the viewing direction (look up, look down) and was developed for the scenes in Matterport3D dataset (not HM3DSem). You probably need to adapt the code to HM3DSem dataset.

Eku127 commented 2 weeks ago

Hi @Eku127 , thank you for your interest in our project.

Could you please clarify how the camera poses in the HM3D scenes were obtained?

We implemented a keyboard controller in the habitat-sim and controlled an agent to walk through scenes from HM3DSem. The observations (RGB, depth, camera pose) during the exploration are collected and saved.

Was there any prior work that used the same pose sequence?

The dataset used in HOV-SG project was not used in other projects before. Since the HM3DSem dataset doesn't provide any labels for floors and rooms. We manually labeled 8 scenes for experiments in the paper. If you like, you can also reuse our dataset in your future work for consistent comparison.

Additionally, if there is any code available to generate these poses, it would be extremely helpful

You can look at the code from the VLMaps project as a reference to collect your own custom dataset. The code provides basic functions to move agent around (move forward, turn left, turn right) but didn't include functions to move the viewing direction (look up, look down) and was developed for the scenes in Matterport3D dataset (not HM3DSem). You probably need to adapt the code to HM3DSem dataset.

Thank you for the detailed and prompt response! I will check out the code you mentioned from the VLMaps and figure out how to get 6 dof poses :P, as I’m currently only able to get x, y, and yaw information from the agent’s state :<

Tom-Huang commented 2 weeks ago

maybe look at the following code to understand how to save 6DOF camera pose:

import math
import os
import sys
import time

import habitat_sim
import numpy as np
from tqdm import tqdm

def save_state(root_save_dir, sim_setting, agent_state, save_count):
    save_name = sim_setting["scene"].split("/")[-1].split(".")[0] + f"_{save_count:06}.txt"
    save_dir = os.path.join(root_save_dir, "pose")
    os.makedirs(save_dir, exist_ok=True)
    save_path = os.path.join(save_dir, save_name)

    pos = agent_state.sensor_states["depth_sensor"].position
    cam_rot = agent_state.sensor_states["depth_sensor"].rotation
    quat = [
        cam_rot.x,
        cam_rot.y,
        cam_rot.z,
        cam_rot.w,
    ]
    print("Cam pos: ", pos)
    print("Cam quat: ", Rotation.from_quat(quat).as_matrix())
    print("Robot Pos: ", agent_state.position)
    robo_quat = [
        agent_state.rotation.x,
        agent_state.rotation.y,
        agent_state.rotation.z,
        agent_state.rotation.w,
    ]
    print("Robot quat: ", Rotation.from_quat(robo_quat).as_matrix())

    # convert quaternion to Transformation matrix
    R = Rotation.from_quat(quat).as_matrix()
    T = np.eye(4)
    T[:3, :3] = R
    T[:3, 3] = pos
    with open(save_path, "w") as f:
        for i in range(4):
            for j in range(4):
                f.write(f"{T[i,j]}\t")
        f.write("\n")

You could close the issue if this helps. Thanks!

Eku127 commented 2 weeks ago

maybe look at the following code to understand how to save 6DOF camera pose:

import math
import os
import sys
import time

import habitat_sim
import numpy as np
from tqdm import tqdm

def save_state(root_save_dir, sim_setting, agent_state, save_count):
    save_name = sim_setting["scene"].split("/")[-1].split(".")[0] + f"_{save_count:06}.txt"
    save_dir = os.path.join(root_save_dir, "pose")
    os.makedirs(save_dir, exist_ok=True)
    save_path = os.path.join(save_dir, save_name)

    pos = agent_state.sensor_states["depth_sensor"].position
    cam_rot = agent_state.sensor_states["depth_sensor"].rotation
    quat = [
        cam_rot.x,
        cam_rot.y,
        cam_rot.z,
        cam_rot.w,
    ]
    print("Cam pos: ", pos)
    print("Cam quat: ", Rotation.from_quat(quat).as_matrix())
    print("Robot Pos: ", agent_state.position)
    robo_quat = [
        agent_state.rotation.x,
        agent_state.rotation.y,
        agent_state.rotation.z,
        agent_state.rotation.w,
    ]
    print("Robot quat: ", Rotation.from_quat(robo_quat).as_matrix())

    # convert quaternion to Transformation matrix
    R = Rotation.from_quat(quat).as_matrix()
    T = np.eye(4)
    T[:3, :3] = R
    T[:3, 3] = pos
    with open(save_path, "w") as f:
        for i in range(4):
            for j in range(4):
                f.write(f"{T[i,j]}\t")
        f.write("\n")

You could close the issue if this helps. Thanks!

That makes sense; I was confused between the concepts of sensor state and agent state before. It seems that the agent state only allows for 2D motions. Thank you for the code and your advice again!