Closed Eku127 closed 2 weeks ago
Hi @Eku127 , thank you for your interest in our project.
Could you please clarify how the camera poses in the HM3D scenes were obtained?
We implemented a keyboard controller in the habitat-sim and controlled an agent to walk through scenes from HM3DSem. The observations (RGB, depth, camera pose) during the exploration are collected and saved.
Was there any prior work that used the same pose sequence?
The dataset used in HOV-SG project was not used in other projects before. Since the HM3DSem dataset doesn't provide any labels for floors and rooms. We manually labeled 8 scenes for experiments in the paper. If you like, you can also reuse our dataset in your future work for consistent comparison.
Additionally, if there is any code available to generate these poses, it would be extremely helpful
You can look at the code from the VLMaps project as a reference to collect your own custom dataset. The code provides basic functions to move agent around (move forward, turn left, turn right) but didn't include functions to move the viewing direction (look up, look down) and was developed for the scenes in Matterport3D dataset (not HM3DSem). You probably need to adapt the code to HM3DSem dataset.
Hi @Eku127 , thank you for your interest in our project.
Could you please clarify how the camera poses in the HM3D scenes were obtained?
We implemented a keyboard controller in the habitat-sim and controlled an agent to walk through scenes from HM3DSem. The observations (RGB, depth, camera pose) during the exploration are collected and saved.
Was there any prior work that used the same pose sequence?
The dataset used in HOV-SG project was not used in other projects before. Since the HM3DSem dataset doesn't provide any labels for floors and rooms. We manually labeled 8 scenes for experiments in the paper. If you like, you can also reuse our dataset in your future work for consistent comparison.
Additionally, if there is any code available to generate these poses, it would be extremely helpful
You can look at the code from the VLMaps project as a reference to collect your own custom dataset. The code provides basic functions to move agent around (move forward, turn left, turn right) but didn't include functions to move the viewing direction (look up, look down) and was developed for the scenes in Matterport3D dataset (not HM3DSem). You probably need to adapt the code to HM3DSem dataset.
Thank you for the detailed and prompt response! I will check out the code you mentioned from the VLMaps and figure out how to get 6 dof poses :P, as I’m currently only able to get x, y, and yaw information from the agent’s state :<
maybe look at the following code to understand how to save 6DOF camera pose:
import math
import os
import sys
import time
import habitat_sim
import numpy as np
from tqdm import tqdm
def save_state(root_save_dir, sim_setting, agent_state, save_count):
save_name = sim_setting["scene"].split("/")[-1].split(".")[0] + f"_{save_count:06}.txt"
save_dir = os.path.join(root_save_dir, "pose")
os.makedirs(save_dir, exist_ok=True)
save_path = os.path.join(save_dir, save_name)
pos = agent_state.sensor_states["depth_sensor"].position
cam_rot = agent_state.sensor_states["depth_sensor"].rotation
quat = [
cam_rot.x,
cam_rot.y,
cam_rot.z,
cam_rot.w,
]
print("Cam pos: ", pos)
print("Cam quat: ", Rotation.from_quat(quat).as_matrix())
print("Robot Pos: ", agent_state.position)
robo_quat = [
agent_state.rotation.x,
agent_state.rotation.y,
agent_state.rotation.z,
agent_state.rotation.w,
]
print("Robot quat: ", Rotation.from_quat(robo_quat).as_matrix())
# convert quaternion to Transformation matrix
R = Rotation.from_quat(quat).as_matrix()
T = np.eye(4)
T[:3, :3] = R
T[:3, 3] = pos
with open(save_path, "w") as f:
for i in range(4):
for j in range(4):
f.write(f"{T[i,j]}\t")
f.write("\n")
You could close the issue if this helps. Thanks!
maybe look at the following code to understand how to save 6DOF camera pose:
import math import os import sys import time import habitat_sim import numpy as np from tqdm import tqdm def save_state(root_save_dir, sim_setting, agent_state, save_count): save_name = sim_setting["scene"].split("/")[-1].split(".")[0] + f"_{save_count:06}.txt" save_dir = os.path.join(root_save_dir, "pose") os.makedirs(save_dir, exist_ok=True) save_path = os.path.join(save_dir, save_name) pos = agent_state.sensor_states["depth_sensor"].position cam_rot = agent_state.sensor_states["depth_sensor"].rotation quat = [ cam_rot.x, cam_rot.y, cam_rot.z, cam_rot.w, ] print("Cam pos: ", pos) print("Cam quat: ", Rotation.from_quat(quat).as_matrix()) print("Robot Pos: ", agent_state.position) robo_quat = [ agent_state.rotation.x, agent_state.rotation.y, agent_state.rotation.z, agent_state.rotation.w, ] print("Robot quat: ", Rotation.from_quat(robo_quat).as_matrix()) # convert quaternion to Transformation matrix R = Rotation.from_quat(quat).as_matrix() T = np.eye(4) T[:3, :3] = R T[:3, 3] = pos with open(save_path, "w") as f: for i in range(4): for j in range(4): f.write(f"{T[i,j]}\t") f.write("\n")
You could close the issue if this helps. Thanks!
That makes sense; I was confused between the concepts of sensor state and agent state before. It seems that the agent state only allows for 2D motions. Thank you for the code and your advice again!
“We provide a script (hovsg/data/hm3dsem/gen_hm3dsem_walks_from_poses.py) that turns a set of camera poses (hovsg/data/hm3dsem/metadata/poses) into a sequence of RGB-D observations using the habitat-sim simulator.”
Hi, I have a question regarding the dataset. Could you please clarify how the camera poses in the HM3D scenes were obtained? Was there any prior work that used the same pose sequence?
Additionally, if there is any code available to generate these poses, it would be extremely helpful. Thanks in advance!