wzcai99 / Pixel-Navigator

Official GitHub Repository for Paper "Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill", ICRA 2024
72 stars 6 forks source link

How to calculate `goal_loss‘? #8

Open xiyuying opened 1 month ago

xiyuying commented 1 month ago

Hello, I would like to ask, what are the ground truth values that the goal_head is supposed to learn?

wzcai99 commented 1 month ago

Hi, you need to implement a function to track the 3D point indicated in the first frame. First store the world coordinate of the pixel goal in the first frame, and then find the nearest point in the rest frames and project the point back into the image plane.

xiyuying commented 1 month ago

OK,Thank you for your reply!

FlightVin commented 1 month ago

Hi @wzcai99 @xiyuying, could you provide the function used to track the 3D point? I'm trying to modify the evaluate_policy script but am unable to get the pixel prediction to test the objective properly.

FlightVin commented 1 month ago

For further clarity, this is a sanity check code that I've written until now:

def random_pixel_goal(habitat_config, habitat_env):
    camera_int = habitat_camera_intrinsic(habitat_config)
    robot_pos = habitat_env.sim.get_agent_state().position
    robot_rot = habitat_env.sim.get_agent_state().rotation
    camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
    camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation
    print("Camera Position", camera_pos)
    print("Camera Rotation", camera_rot)
    print("Robot Position", robot_pos)
    print("RObot Rotation", robot_rot, "\n")
    camera_obs = habitat_env.sim.get_observations_at(robot_pos,robot_rot)
    rgb = camera_obs['rgb']
    depth = camera_obs['depth']
    xs, zs, rgb_points, rgb_colors = get_pointcloud_from_depth(rgb, depth, camera_int)
    rgb_points = translate_to_world(rgb_points,camera_pos,quaternion.as_rotation_matrix(camera_rot))
    condition_index = np.where((rgb_points[:,1] < robot_pos[1] + 1.0) & (rgb_points[:,1] > robot_pos[1] - 0.5) & (depth[(zs,xs)][:,0] > 3.0) & (depth[(zs,xs)][:,0] < 10.0))[0]

    pointcloud = o3d.geometry.PointCloud()
    pointcloud.points = o3d.utility.Vector3dVector(rgb_points[condition_index])
    pointcloud.colors = o3d.utility.Vector3dVector(rgb_colors[condition_index]/255.0)
    if condition_index.shape[0] == 0:
        return False, [], [], [], []
    else:
        random_index = np.random.choice(condition_index)
        target_x = xs[random_index]
        target_z = zs[random_index]
        print(target_x, target_z)
        target_point = rgb_points[random_index]
        min_z = max(target_z-mask_shape,0)
        max_z = min(target_z+mask_shape,depth.shape[0])
        min_x = max(target_x-mask_shape,0)
        max_x = min(target_x+mask_shape,depth.shape[1])
        target_mask = np.zeros((depth.shape[0],depth.shape[1]),np.uint8)
        target_mask[min_z:max_z,min_x:max_x] = 1
        original_target_point = target_point.copy()
        target_point[1] = robot_pos[1]
        return True, rgb, target_mask, target_point, original_target_point

def get_goal_point_location_on_image(habitat_config, habitat_env, target_point):
    camera_int = habitat_camera_intrinsic(habitat_config)
    focal_length_x = camera_int[0, 0]
    focal_length_y = camera_int[1, 1]
    center_x = camera_int[0, 2]
    center_y = camera_int[1, 2]

    camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
    camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation

    rotation_matrix = quaternion.as_rotation_matrix(camera_rot)

    relative_target_point = target_point - camera_pos
    print("Camera Position", camera_pos)
    print("Goal point", target_point)
    print("Goal - Camera Pos", relative_target_point)
    camera_coordinates = rotation_matrix.T @ relative_target_point
    print("Transformed to camera frame", camera_coordinates)

    x_camera, y_camera, z_camera = camera_coordinates

    print((focal_length_x * x_camera / z_camera) + center_x)
    print((focal_length_y * y_camera / z_camera) + center_y)

Here, I use them as:

    goal_flag, goal_image, goal_mask, goal_point, height_uncorrected_goal_point = random_pixel_goal(habitat_config, env)
    start_x_pixel, start_y_pixel = get_goal_point_location_on_image(habitat_config, env, height_uncorrected_goal_point)

I would expect print(target_x, target_z) and the print statements in the get_goal_point_location_on_image to return the same values. However, they do not.

Any help would be much appreciated. Thanks!

xiyuying commented 1 month ago

Hello, I'm not sure if my work is effective. @FlightVin. Below is my code for tracking the pixel position of the target point in the image. I plan to calculate the loss between this output and goal_y and goal_x from policy_agent.py. However, I'm facing some issues with training, so I haven't been able to verify if this approach is correct. However, I tried displaying the return value of track_goal_point on each image, and the result shows that it is very close to the location of the goal in the image (goal_image, goal_mask).

`def track_goal_point(obs_image, obs_depth, target_point, habitat_env, habitat_config): camera_int = habitat_camera_intrinsic(habitat_config)

robot_pos = habitat_env.sim.get_agent_state().position
robot_rot = habitat_env.sim.get_agent_state().rotation
camera_pos = habitat_env.sim.get_agent_state().sensor_states['rgb'].position
camera_rot = habitat_env.sim.get_agent_state().sensor_states['rgb'].rotation

xs, zs, rgb_points, rgb_colors = get_pointcloud_from_depth(obs_image, obs_depth, camera_int)

rgb_points_world = translate_to_world(rgb_points, camera_pos, quaternion.as_rotation_matrix(camera_rot))

distances = np.linalg.norm(rgb_points_world - target_point, axis=1)

closest_point_index = np.argmin(distances)

closest_pixel_x = xs[closest_point_index]
closest_pixel_y = zs[closest_point_index]

closest_pixel_x_normalized = closest_pixel_x / obs_image.shape[1]
closest_pixel_y_normalized = closest_pixel_y / obs_image.shape[0]

return closest_pixel_x_normalized, closest_pixel_y_normalized`
FlightVin commented 1 month ago

Thanks @xiyuying

I'll try running it in a while and will lyk of our implementation succeeds. Btw if you would like to share reimplementations, do reach out to me through the email in my profile.