waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.6k stars 604 forks source link

How do you compute projected_lidar_labels? #271

Open JenningsL opened 3 years ago

JenningsL commented 3 years ago

Hi, I'm tring to project lidar labels to images, and I only care about boxes with type=SIGN. However, I noticed that some boxes(blue) are slightly shift from the projected_lidar_labels(green) provided in tfrecord, especially those at the edge of image. I only take image distortion into account, so it seems that rolling shutter effect is the only reason? So a few questions:

  1. Could you please tell me where can I find the codes computing projected_lidar_labels?
  2. Is it possible that this effect is due to inaccurate pose?

image image

JenningsL commented 3 years ago

I also tried the py_camera_model_ops.world_to_image function, however, the results are very close to mine. For example, the blue boxes in the following image are computed with world_to_image. (frame 54 in segment-11718898130355901268_2300_000_2320_000_with_camera_labels.tfrecord) image

The projection code I used:

def project_points_to_image_sdk(calibration, image, global_points):
    import tensorflow as tf
    g = tf.Graph()
    with g.as_default():
        global_points = tf.constant(global_points, dtype=tf.float32)
        extrinsic = tf.reshape(
            tf.constant(list(calibration.extrinsic.transform), dtype=tf.float32),
            [4, 4]
        )
        intrinsic = tf.constant(list(calibration.intrinsic), dtype=tf.float32)
        metadata = tf.constant([
            calibration.width, calibration.height,
            calibration.rolling_shutter_direction
        ], dtype=tf.int32)
        camera_image_metadata = list(image.pose.transform)
        camera_image_metadata.append(image.velocity.v_x)
        camera_image_metadata.append(image.velocity.v_y)
        camera_image_metadata.append(image.velocity.v_z)
        camera_image_metadata.append(image.velocity.w_x)
        camera_image_metadata.append(image.velocity.w_y)
        camera_image_metadata.append(image.velocity.w_z)
        camera_image_metadata.append(image.pose_timestamp)
        camera_image_metadata.append(image.shutter)
        camera_image_metadata.append(image.camera_trigger_time)
        camera_image_metadata.append(image.camera_readout_done_time)

        image_points_t = py_camera_model_ops.world_to_image(
            extrinsic, intrinsic, metadata, camera_image_metadata, global_points)

    with tf.compat.v1.Session(graph=g) as sess:
        extrinsic,intrinsic,image_points_t = sess.run([extrinsic,intrinsic,image_points_t])
    return image_points_t[:,:2]