waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.66k stars 609 forks source link

2D-3D Keypoint relationships and 3D keypoint to image projections #493

Open ptr-br opened 2 years ago

ptr-br commented 2 years ago

Hi waymo research,

I have some questions regarding the keypoint data published with the latest releases.

  1. Is there any correspondence between 3D and 2D labeling? I guess the 3D labels are obtained by inspecting the LiDAR point cloud, however, if one projects the 3D labels back to the camera, should they align with a 2D label for the same frame?
  2. I encountered some examples where 3D is labeled and 2D not, although possible in my opinion. Is there any rule when 2D or 3D (or both) are annotated?
  3. I tried to project the 3D keypoints to images on my own using similar code as #146 and #255. However, the point cloud and the keypoints do not align properly in some cases (especially at the side cameras with a moving vehicle). Could you maybe provide an example for the mapping from world to image coordinates since everything I found in related issues did not work correctly.

If some example code is needed, I can post it here, however, it's quite similar to the one in the issues above...

Example of a pedestrian: Note: Both the LiDAR point cloud and the keypoints seem to be slightly shifted to the left after projection.

3D keypoints to image projection cp_keypoints

Point cloud image projection cp_lidar

Camera projection stored in tfr-file ("gt") cp_lidar_gt

alexgorban commented 2 years ago

Hi, thanks for your interest for the dataset!

  1. There is correspondence between 2D and 3D key points by their type (except FOREHEAD and CENTER_HEAD points). You are right, 3D key points were obtained by labeling laser point clouds, while 2D key points were independently obtained by labeling camera images. So laser and camera key points for the same object may have different types key points labeled and when you project 3D key points to camera even the same type 2D key points will likely not match the projections. There are two main reasons for the mismatch: a) they are independently labeled b) laser points were captured at a slightly different time compared to when camera pixels were exposed. So moving objects will likely have large difference between 2D labels and projected 3D labels.

  2. Laser and camera key points are published only for some objects on some frames. There are ~10K object/frames with laser key points and ~200K object/frames with camera key points. There is no strict rule when both are labeled, but most of object/frames which have 3D key points should have 2D key points as well.

  3. We recently fixed one issue related to synchronization between laser and right side camera, so please make sure you use the v1.3.2 release. But even with this fix there will be discrepancies between camera and projected laser points (see 1.b) above) - camera and laser should match perfectly in the center column of a camera image, discrepancy will be larger for columns which are further away from the center and fast moving objects.

  4. We will investigate the mismatch between point cloud image projection and camera projection stored in tfr-file ("gt") and update the mentioned issues.

I'm closing this issue for now, please feel free to reopen if you have any further questions.

alexgorban commented 2 years ago

@ptr-br, btw, can you please share context name and frame.timestamp_micros for the example you shared? @

ptr-br commented 2 years ago

Hey @alexgorban,

thank you for the quick and informative reply!

For the above example: context name: 5847910688643719375_180_000_200_000 timestamp micros: 1548722497947691

From my experience, only a little more then half the 3D keypoints also have 2D labels (~5.5k). I noticed that the image height is 886 pixels, although the paper mentions that the camera resolution for the side cameras should be 1920x1040. Is there any reason for the difference?

Heres is also some code to reproduce my results.

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import io
import PIL
from waymo_open_dataset.camera.ops import py_camera_model_ops
from waymo_open_dataset.utils import frame_utils
from waymo_open_dataset import dataset_pb2
from waymo_open_dataset.utils import keypoint_data
from waymo_open_dataset.protos import keypoint_pb2
from waymo_open_dataset.utils import keypoint_draw

# some functions
def _imshow(ax: plt.Axes, image_np: np.ndarray):
    ax.imshow(image_np)
    ax.axis('off')
    ax.set_autoscale_on(False)

def _imdecode(buf: bytes) -> np.ndarray:
    with io.BytesIO(buf) as fd:
        pil = PIL.Image.open(fd)
        return np.array(pil)

def plot_points_on_image(projected_points, camera_image, rgba_func,
                         point_size=5.0):
    """Plots points on a camera image.

    Args:
      projected_points: [N, 3] numpy array. The inner dims are
        [camera_x, camera_y, range].
      camera_image: jpeg encoded camera image.
      rgba_func: a function that generates a color from a range value.
      point_size: the point size.

    """
    img = tf.image.decode_jpeg(camera_image.image)
    height, width, _ = img.shape
    plot_image(camera_image)
    print(f'Image height: {height}')
    print(f'Image width: {width}')

    xs = []
    ys = []
    colors = []

    for point in projected_points:

        if 0 <= point[0] <= width and 0 <= point[1] <= height:
            xs.append(point[0])  # width, col
            ys.append(point[1])  # height, row
            colors.append(rgba_func(point[2]))

    plt.scatter(xs, ys, c=colors, s=point_size, edgecolors="none")

def plot_image(camera_image):
    """Plot a cmaera image."""
    plt.figure(figsize=(20, 12))
    plt.imshow(tf.image.decode_jpeg(camera_image.image))
    plt.grid("off")

def rgba(r):
    """Generates a color based on range.

    Args:
    r: the range value of a given point.
    Returns:
    The color for a given range
    """
    c = plt.get_cmap('jet')((r % 20.0) / 20.0)
    c = list(c)
    c[-1] = 0.5  # alpha
    return c

if __name__ == "__main__":

    # segment from validation data
    tfr_path = "segment-5847910688643719375_180_000_200_000_with_camera_labels.tfrecord"
    cam = 4
    frame_number = 108
    obj_id = "wq6ur5wzmxZCabo21Ke0xQ"

    # get data and labels
    dataset = tf.data.TFRecordDataset(tfr_path, compression_type='')
    frame_counter = 0
    for data in dataset:
        if frame_counter == int(frame_number):
            frame = dataset_pb2.Frame()
            frame.ParseFromString(bytearray(data.numpy()))
            break
        frame_counter += 1

    print(f"Context name:{frame.context.name}")
    print(f"Frame timestamp micros: {frame.timestamp_micros}")

    camera_image_by_name = {i.name: i.image for i in frame.images}
    image_np = _imdecode(camera_image_by_name[cam])

    tfr_labels = keypoint_data.group_object_labels(frame)
    obj_labels = tfr_labels[obj_id]

    # project 3D keypoints and lidar point cloud to image
    range_images, camera_projections, seg_labels, range_image_top_pose = frame_utils.parse_range_image_and_camera_projection(frame)

    points, cp_points = frame_utils.convert_range_image_to_point_cloud(
        frame,
        range_images,
        camera_projections,
        range_image_top_pose)

    points_all = np.concatenate(points, axis=0)

    image = next(im for im in frame.images if im.name == cam)

    pose_tf = tf.reshape(tf.constant(frame.pose.transform), [4, 4])
    points_world = tf.einsum('ij,nj->ni', pose_tf[:3, :3], points_all) + pose_tf[:3, 3]

    cam_calib = next(c for c in frame.context.camera_calibrations if c.name == cam)
    extrinsic = tf.reshape(tf.constant(cam_calib.extrinsic.transform), [4, 4])
    intrinsic = tf.constant(cam_calib.intrinsic)
    metadata = tf.constant([cam_calib.width, cam_calib.height, cam_calib.rolling_shutter_direction], dtype=tf.int32)

    camera_image_metadata = tf.constant([
        *image.pose.transform,
        image.velocity.v_x, image.velocity.v_y, image.velocity.v_z,
        image.velocity.w_x, image.velocity.w_y, image.velocity.w_z,
        image.pose_timestamp,
        image.shutter,
        image.camera_trigger_time,
        image.camera_readout_done_time
    ], dtype=tf.float32)

    types = []
    keypoints = []
    ocllusions = []
    for elm in obj_labels.laser.keypoints.keypoint:
        types.append(elm.type)
        keypoints.append((elm.keypoint_3d.location_m.x,
                          elm.keypoint_3d.location_m.y,
                          elm.keypoint_3d.location_m.z))
        ocllusions.append(elm.keypoint_3d.visibility.is_occluded)

    keypoints = tf.constant(keypoints)
    keypoints_world = tf.einsum('ij,nj->ni', pose_tf[:3, :3], keypoints) + pose_tf[:3, 3]

    cp_keypoints = py_camera_model_ops.world_to_image(extrinsic, intrinsic, metadata, camera_image_metadata, keypoints_world)

    tf_keypoints = []
    for index, kp in enumerate(cp_keypoints):
        cam_keypoint = keypoint_pb2.CameraKeypoint()
        cam_keypoint.type = types[index]
        cam_keypoint.keypoint_2d.location_px.x = kp[0]
        cam_keypoint.keypoint_2d.location_px.y = kp[1]
        cam_keypoint.keypoint_2d.visibility.is_occluded = ocllusions[index]
        tf_keypoints.append(cam_keypoint)

    camera_wireframe = keypoint_draw.build_camera_wireframe(
        tf_keypoints)

    keypoint_draw.OCCLUDED_BORDER_WIDTH = 3
    fig, ax = plt.subplots(frameon=False, figsize=(25, 25))
    keypoint_draw.draw_camera_wireframe(ax, camera_wireframe)

    _imshow(ax, image_np)
    fig.savefig("keypoint_projection", dpi=500)

    # lidar projections to image
    cp_points_all = py_camera_model_ops.world_to_image(extrinsic, intrinsic, metadata, camera_image_metadata, points_world)
    distance = tf.norm(points_all, axis=-1, keepdims=True)
    mask = tf.equal(cp_points_all[..., -1], 1)
    distance = tf.cast(tf.gather_nd(distance, tf.where(mask)), dtype=tf.float64)
    valid_points = tf.cast(tf.gather_nd(cp_points_all, tf.where(mask)), dtype=tf.float64)[:,:2]
    valid_points = tf.concat([valid_points, distance], axis=-1)
    plot_points_on_image(valid_points,
                         image, rgba, point_size=5.0)
    plt.savefig('lidar_projections',  dpi=500)
ptr-br commented 2 years ago

Actually, I'm not able to re-open my own issues if a repo collaborator closed it.

alexgorban commented 2 years ago

Thank you for all the details and code snippet. I'll need to further investigate this issue.

ZizhouJia commented 1 year ago

I can not find the py_camera_model_ops in version 1.5.2