waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.66k stars 609 forks source link

Converting image coordinate to vehicle coordinate #724

Open sevcan52 opened 11 months ago

sevcan52 commented 11 months ago

Hi, How can I convert pedestrian box center pixel values (cx, cy) to values which shows the distance from vehicle may be vehicle coordinate system?

alexzzhu commented 11 months ago

Hi, to clarify a bit, are you trying to get the depth of a 2D box, or compute the distance from the vehicle for a 3D box?

sevcan52 commented 10 months ago

Hi, I want to calculate the distance between agent 2D box and vehicle.

alexzzhu commented 10 months ago

I don't believe that there are associations between vehicle 2d and 3d boxes, so you may have to use the lidar depths to complete an approximate depth for the box in this case.

SyedShafiq commented 2 months ago

I don't believe that there are associations between vehicle 2d and 3d boxes, so you may have to use the lidar depths to complete an approximate depth for the box in this case.

What components can be used to find the lidar depth for the box?

alexzzhu commented 2 months ago

There's not a canonical way, but one example would be to find the average depth of the lidar points that fall inside the box.

SyedShafiq commented 2 months ago

There's not a canonical way, but one example would be to find the average depth of the lidar points that fall inside the box.

Could you give me an example as to how to do that using the components given? My requirement is that I have to find the depth of a vehicle from the camera across frames. These vehicles should also have a 2D bounding box for further detection.

What would be the general process of finding the depth using the components provided by the dataset?

alexzzhu commented 2 months ago

We provide the laser points projected to each camera here: https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/dataset.proto#L211. Given a 2D bounding box, you can find which of these laser points falls inside the box, and find some summary statistic of these points that fits your use case (e.g. mean, min etc.).

This function maybe helpful to parse the projected points: https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/utils/frame_utils.py#L34

SyedShafiq commented 2 months ago

We provide the laser points projected to each camera here:

https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/dataset.proto#L211

. Given a 2D bounding box, you can find which of these laser points falls inside the box, and find some summary statistic of these points that fits your use case (e.g. mean, min etc.). This function maybe helpful to parse the projected points:

https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/utils/frame_utils.py#L34

Thank you for the reply. I am using parquet files. If I am not wrong, one has to use v2 format to parse these files. Which of the functions in the v2 format can I use to perform the above mentioned methodology by you?

alexzzhu commented 2 months ago

Ah you're right. The projected lidar points can be found here then: https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/v2/perception/lidar.py#L99

SyedShafiq commented 2 months ago

Ah you're right. The projected lidar points can be found here then:

https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/v2/perception/lidar.py#L99

Thank you so much Alex. I shall get back if I have any further questions. Appreciate your help :)

SyedShafiq commented 2 months ago

Ah you're right. The projected lidar points can be found here then:

https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/v2/perception/lidar.py#L99`

Could you tell me if this can be used in any way to find an association between vehicle 2D and Lidar3D box? Or are the parquet files for this component only for the pedestrian class? https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/src/waymo_open_dataset/v2/perception/box.py#L81

alexzzhu commented 2 months ago

I believe these only exist for pedestrian boxes.

SyedShafiq commented 2 months ago

I believe these only exist for pedestrian boxes.

But if you see this tutorial: https://github.com/waymo-research/waymo-open-dataset/blob/5f8a1cd42491210e7de629b6f8fc09b65e0cbe99/tutorial/tutorial_camera_only.ipynb#:~:text=tutorial_camera_only.-,ipynb,-tutorial_keypoints.ipynb

It uses lidar_synced_boxes and camera_synced boxes from the dataset.proto format. And if one runs the entire tutorial, there is also visualize projected_lidar_labels and to visualize camera_synced_box projections. From what I see, LiDARCameraSyncedBoxComponent of the v2 format has the same components as lidar_synced_boxes, camera_synced boxes just that it is in v2 format.

There is also a note in the tutorial which says:

Note that unlike camera_labels, which are only associated with the corresponding laser_labels for certain object types, there is always a correspondence between projected_lidar_labels and laserlabels.

download (1) download (2) download

These are the images I obtained on running the tutorial. And there seems to be boxes for vehicles as well. Do you reckon that I use these boxes to calculate the depth of vehicles?

alexzzhu commented 2 months ago

Oh hmm I'm a little confused about your use case I think. I thought you needed to use the 2D bounding boxes, which were labeled for images only, and are separate from the 3D bounding boxes, which are labeled for the LiDAR. Between these two sets of boxes we only have correspondences for pedestrians. If you are using the 3D bounding boxes, then yes, there is natively depth encoded already. The projected boxes above correspond to the 3D boxes simply projected into the image plane.

SyedShafiq commented 2 months ago

Oh hmm I'm a little confused about your use case I think. I thought you needed to use the 2D bounding boxes, which were labeled for images only, and are separate from the 3D bounding boxes, which are labeled for the LiDAR. Between these two sets of boxes we only have correspondences for pedestrians. If you are using the 3D bounding boxes, then yes, there is natively depth encoded already. The projected boxes above correspond to the 3D boxes simply projected into the image plane.

Sorry for the confusion. I do require 2D boxes of those objects as well, whose 3D boxes are projected into the image plane for my use case. I would need an object_id that would correspond with both these boxes.

Question:

  1. Can I use the 2D boxes created by the function show_projected_camera_synced_boxes for my use case? ( I have attached the code which was present in the tutorial). Since the boxes that are generated here is using ""box = label.camera_synced_box"". In v2 format it should be "LiDARCameraSyncedBoxComponent" which also has key.laser_object_id. Then I would know that the 2D and 3D box created would be for that particular object. With the 3D box I would calculate the depth of the object and use the same 2D box which would have the same id key.laser_object_id.
def show_projected_camera_synced_boxes(camera_image, ax, draw_3d_box=False):
  """Displays camera_synced_box 3D labels projected onto camera."""
  calibration = next(cc for cc in frame.context.camera_calibrations
                     if cc.name == camera_image.name)

  for label in frame.laser_labels:
    box = label.camera_synced_box

    if not box.ByteSize():
      continue  # Filter out labels that do not have a camera_synced_box.
    if (FILTER_AVAILABLE and not label.num_top_lidar_points_in_box) or (
        not FILTER_AVAILABLE and not label.num_lidar_points_in_box):
      continue  # Filter out likely occluded objects.

    # Retrieve upright 3D box corners.
    box_coords = np.array([[
        box.center_x, box.center_y, box.center_z, box.length, box.width,
        box.height, box.heading
    ]])
    corners = box_utils.get_upright_3d_box_corners(
        box_coords)[0].numpy()  # [8, 3]

    # Project box corners from vehicle coordinates onto the image.
    projected_corners = project_vehicle_to_image(frame.pose, calibration,
                                                 corners)
    u, v, ok = projected_corners.transpose()
    ok = ok.astype(bool)

    # Skip object if any corner projection failed. Note that this is very
    # strict and can lead to exclusion of some partially visible objects.
    if not all(ok):
      continue
    u = u[ok]
    v = v[ok]

    # Clip box to image bounds.
    u = np.clip(u, 0, calibration.width)
    v = np.clip(v, 0, calibration.height)

    if u.max() - u.min() == 0 or v.max() - v.min() == 0:
      continue

    if draw_3d_box:
      # Draw approximate 3D wireframe box onto the image. Occlusions are not
      # handled properly.
      draw_3d_wireframe_box(ax, u, v, (1.0, 1.0, 0.0))
    else:
      # Draw projected 2D box onto the image.
      draw_2d_box(ax, u, v, (1.0, 1.0, 0.0))
  1. If it is possible, how do I find the native depth of the 3D bounding box that is already encoded?