Open xiyuying opened 1 month ago
Hi, you need to implement a function to track the 3D point indicated in the first frame. First store the world coordinate of the pixel goal in the first frame, and then find the nearest point in the rest frames and project the point back into the image plane.
OK,Thank you for your reply!
Hi @wzcai99 @xiyuying, could you provide the function used to track the 3D point? I'm trying to modify the evaluate_policy
script but am unable to get the pixel prediction to test the objective properly.
For further clarity, this is a sanity check code that I've written until now:
def random_pixel_goal(habitat_config, habitat_env):
camera_int = habitat_camera_intrinsic(habitat_config)
robot_pos = habitat_env.sim.get_agent_state().position
robot_rot = habitat_env.sim.get_agent_state().rotation
camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation
print("Camera Position", camera_pos)
print("Camera Rotation", camera_rot)
print("Robot Position", robot_pos)
print("RObot Rotation", robot_rot, "\n")
camera_obs = habitat_env.sim.get_observations_at(robot_pos,robot_rot)
rgb = camera_obs['rgb']
depth = camera_obs['depth']
xs, zs, rgb_points, rgb_colors = get_pointcloud_from_depth(rgb, depth, camera_int)
rgb_points = translate_to_world(rgb_points,camera_pos,quaternion.as_rotation_matrix(camera_rot))
condition_index = np.where((rgb_points[:,1] < robot_pos[1] + 1.0) & (rgb_points[:,1] > robot_pos[1] - 0.5) & (depth[(zs,xs)][:,0] > 3.0) & (depth[(zs,xs)][:,0] < 10.0))[0]
pointcloud = o3d.geometry.PointCloud()
pointcloud.points = o3d.utility.Vector3dVector(rgb_points[condition_index])
pointcloud.colors = o3d.utility.Vector3dVector(rgb_colors[condition_index]/255.0)
if condition_index.shape[0] == 0:
return False, [], [], [], []
else:
random_index = np.random.choice(condition_index)
target_x = xs[random_index]
target_z = zs[random_index]
print(target_x, target_z)
target_point = rgb_points[random_index]
min_z = max(target_z-mask_shape,0)
max_z = min(target_z+mask_shape,depth.shape[0])
min_x = max(target_x-mask_shape,0)
max_x = min(target_x+mask_shape,depth.shape[1])
target_mask = np.zeros((depth.shape[0],depth.shape[1]),np.uint8)
target_mask[min_z:max_z,min_x:max_x] = 1
original_target_point = target_point.copy()
target_point[1] = robot_pos[1]
return True, rgb, target_mask, target_point, original_target_point
def get_goal_point_location_on_image(habitat_config, habitat_env, target_point):
camera_int = habitat_camera_intrinsic(habitat_config)
focal_length_x = camera_int[0, 0]
focal_length_y = camera_int[1, 1]
center_x = camera_int[0, 2]
center_y = camera_int[1, 2]
camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation
rotation_matrix = quaternion.as_rotation_matrix(camera_rot)
relative_target_point = target_point - camera_pos
print("Camera Position", camera_pos)
print("Goal point", target_point)
print("Goal - Camera Pos", relative_target_point)
camera_coordinates = rotation_matrix.T @ relative_target_point
print("Transformed to camera frame", camera_coordinates)
x_camera, y_camera, z_camera = camera_coordinates
print((focal_length_x * x_camera / z_camera) + center_x)
print((focal_length_y * y_camera / z_camera) + center_y)
Here, I use them as:
goal_flag, goal_image, goal_mask, goal_point, height_uncorrected_goal_point = random_pixel_goal(habitat_config, env)
start_x_pixel, start_y_pixel = get_goal_point_location_on_image(habitat_config, env, height_uncorrected_goal_point)
I would expect print(target_x, target_z)
and the print statements in the get_goal_point_location_on_image
to return the same values. However, they do not.
Any help would be much appreciated. Thanks!
Hello, I'm not sure if my work is effective. @FlightVin. Below is my code for tracking the pixel position of the target point in the image. I plan to calculate the loss between this output and goal_y and goal_x from policy_agent.py. However, I'm facing some issues with training, so I haven't been able to verify if this approach is correct. However, I tried displaying the return value of track_goal_point on each image, and the result shows that it is very close to the location of the goal in the image (goal_image, goal_mask).
`def track_goal_point(obs_image, obs_depth, target_point, habitat_env, habitat_config): camera_int = habitat_camera_intrinsic(habitat_config)
robot_pos = habitat_env.sim.get_agent_state().position
robot_rot = habitat_env.sim.get_agent_state().rotation
camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation
xs, zs, rgb_points, rgb_colors = get_pointcloud_from_depth(obs_image, obs_depth, camera_int)
rgb_points_world = translate_to_world(rgb_points, camera_pos, quaternion.as_rotation_matrix(camera_rot))
distances = np.linalg.norm(rgb_points_world - target_point, axis=1)
closest_point_index = np.argmin(distances)
closest_pixel_x = xs[closest_point_index]
closest_pixel_y = zs[closest_point_index]
closest_pixel_x_normalized = closest_pixel_x / obs_image.shape[1]
closest_pixel_y_normalized = closest_pixel_y / obs_image.shape[0]
return closest_pixel_x_normalized, closest_pixel_y_normalized`
Thanks @xiyuying
I'll try running it in a while and will lyk of our implementation succeeds. Btw if you would like to share reimplementations, do reach out to me through the email in my profile.
Hello, I would like to ask, what are the ground truth values that the goal_head is supposed to learn?