waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.68k stars 610 forks source link

Project the 3D box into camera side left and find some bias #66

Open hhhmoan opened 4 years ago

hhhmoan commented 4 years ago

Dear,i have seen https://github.com/waymo-research/waymo-open-dataset/issues/24, and make the projection as the method in #24 . I found that in front camera the projection is fine, but in the side left, there is some bias like that. image image image

and the bias is large in the image right side, but small in image left side, as the black car in image. i am not very clear about the rolling shutter cameras, does it cause this problem? the tfrecord is segment-9509506420470671704_4049_100_4069_100.tfrecord and frame is 7,8,9

and the project method is following https://github.com/gdlg/simple-waymo-open-dataset-reader

DianCh commented 4 years ago

I also noticed the misalignment between 2D & 3D. Apart from the rolling shutter cause, I think imperfect synchronization between cameras and lidars (inter-camera synchronization is pretty good in this dataset) as well as extrinsic calibration between cameras and lidars are causing this issue. This is more obvious in left/right side cameras, although I also spotted the same problem in 3 front cameras.

I think this is the best we can get for now, because bridging the consistency between 2D & 3D sensor suite is really expensive for producing such dataset.

peisun1115 commented 4 years ago

Did you try to align it with the pre-populated projections? https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L290

If your projection is not aligned with ours, then it is either because distortion or rolling shutter projection (i checked the code in simple-waymo-od-reader and did not find code related these 2).

hhhmoan commented 4 years ago

Did you try to align it with the pre-populated projections? https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L290

If your projection is not aligned with ours, then it is either because distortion or rolling shutter projection (i checked the code in simple-waymo-od-reader and did not find code related these 2).

thank you for your reply,i try to draw the projected_lidar_labels into the image, and it looks well.(it seems there must have some way to make the project well) image So i try to add the undistort or rolling shutter to make my project well. then i add the distortion to the simple-waymo-od-reader, like this `

Project func:

def display_labels_on_image(cnt, camera_calibration, image, labels, display_time = -1): extrinsic = np.array(camera_calibration.extrinsic.transform).reshape(4,4) intrinsic = camera_calibration.intrinsic

# Camera model:
# | fx  0 cx 0 |
# |  0 fy cy 0 |
# |  0  0  1 0 |
camera_model = np.array([
    [intrinsic[0], 0, intrinsic[2], 0],
    [0, intrinsic[1], intrinsic[3], 0],
    [0, 0,                       1, 0]])

# Swap the axes around
axes_transformation = np.array([
    [0,-1,0,0],
    [0,0,-1,0],
    [1,0,0,0],
    [0,0,0,1]])

# Compute the projection matrix from the vehicle space to image space.
vehicle_to_image = np.matmul(camera_model, np.matmul(axes_transformation, np.linalg.inv(extrinsic)))

# Decode the JPEG image
img = np.array(Image.open(io.BytesIO(image.image)))

#k1,k2,p1,p2,k3  undistort
dist = np.array([intrinsic[4], intrinsic[5], intrinsic[6], intrinsic[7], intrinsic[8]])
mtx = np.eye(3)
mtx[0][0], mtx[1][1] = intrinsic[0], intrinsic[1]
mtx[0][2], mtx[1][2] = intrinsic[2], intrinsic[3]
img2 = img
img = cv2.undistort(img, mtx, dist)
#print(img2 - img)

# Draw all the groundtruth labels
for label in labels:
    draw_3d_box(img, vehicle_to_image, label)

# Display the image
#cv2.imshow("Image", img)
#cv2.waitKey(display_time)
cv2.imwrite('./images/' + str(cnt) + '.jpg', img)
cv2.imwrite('./images/' + str(cnt) + 'u.jpg', img2)

` but there just a little diff like that(the first is before undistort the second is after undistort) image image

maybe the rolling shutter cause the problem? but i dont know how to to solve this problem

hhhmoan commented 4 years ago

And i find this car is longer than we usually see. Is this a normal phenomenon? image

peisun1115 commented 4 years ago

Can you help to try the following?

  1. make sure your projection code (no distortion version) is the same as https://github.com/waymo-research/waymo-open-dataset/issues/24#issuecomment-535718632

  2. If 1 works, then try the way i posted to apply distortion directly on the projection results from 1. https://github.com/waymo-research/waymo-open-dataset/issues/24#issuecomment-535794541 (no need to apply cv2.undistort).

hhhmoan commented 4 years ago

Can you help to try the following?

  1. make sure your projection code (no distortion version) is the same as #24 (comment)
  2. If 1 works, then try the way i posted to apply distortion directly on the projection results from 1. #24 (comment) (no need to apply cv2.undistort).

thank you for your reply again,i try to follow you suggestion (To easy debug, i calc the corner one by one ,make the code little ugly and not remove the box out of image.....sorry) for 1, i use your func directly as #24

import numpy as np
import math
import cv2
from PIL import Image
import io
import sys

from simple_waymo_open_dataset_reader import WaymoDataFileReader
from simple_waymo_open_dataset_reader import dataset_pb2
import tensorflow as tf
tf.compat.v1.enable_eager_execution()

# project_func

def project_labels_into_image(point, camera_calibration):
    extrinsic = tf.reshape(camera_calibration.extrinsic.transform, [4, 4])
    vehicle_to_sensor = tf.matrix_inverse(extrinsic)
    point1 = point
    point1.append(1)
    #print(tf.constant(point1,  dtype=tf.float32).shape)
    point_camera_frame = tf.einsum('ij,j->i', vehicle_to_sensor, tf.constant(point1, dtype=tf.float32))
    u_d = - point_camera_frame[1] / point_camera_frame[0]
    v_d = - point_camera_frame[2] / point_camera_frame[0]

    # add distortion model here if you'd like.
    f_u = camera_calibration.intrinsic[0];
    f_v = camera_calibration.intrinsic[1];
    c_u = camera_calibration.intrinsic[2];
    c_v = camera_calibration.intrinsic[3];
    u_d = u_d * f_u + c_u;
    v_d = v_d * f_v + c_v;

    return [u_d.numpy(), v_d.numpy()]

def projection_label_into_image(cnt, camera_calibration, images, laser_labels):
    img = np.array(Image.open(io.BytesIO(images.image)))
    for label in laser_labels:
        box = label.box

        # Extract the box size
        sl, sh, sw = box.length, box.height, box.width

        # Get the vehicle pose
        center_x, center_y, center_z = box.center_x, box.center_y, box.center_z
        heading = box.heading
        corner = np.array([[-0.5 * sl, -0.5 * sw], [-0.5 * sl, 0.5 * sw], [0.5 * sl, -0.5 * sw], [0.5 * sl, 0.5 * sw]])
        rotation_matrix = np.array([[np.cos(heading), - np.sin(heading)], [np.sin(heading), np.cos(heading)]])
        corner = np.matmul(corner, rotation_matrix)
        A = [center_x + corner[0][0], center_y + corner[0][1], center_z - 0.5 * sh]
        B = [center_x + corner[1][0], center_y + corner[1][1], center_z - 0.5 * sh]
        C = [center_x + corner[2][0], center_y + corner[2][1], center_z - 0.5 * sh]
        D = [center_x + corner[3][0], center_y + corner[3][1], center_z - 0.5 * sh]
        E = [center_x + corner[0][0], center_y + corner[0][1], center_z + 0.5 * sh]
        F = [center_x + corner[1][0], center_y + corner[1][1], center_z + 0.5 * sh]
        G = [center_x + corner[2][0], center_y + corner[2][1], center_z + 0.5 * sh]
        H = [center_x + corner[3][0], center_y + corner[3][1], center_z + 0.5 * sh]
        A_new = project_labels_into_image(A, camera_calibration)
        B_new = project_labels_into_image(B, camera_calibration)
        C_new = project_labels_into_image(C, camera_calibration)
        D_new = project_labels_into_image(D, camera_calibration)
        E_new = project_labels_into_image(E, camera_calibration)
        F_new = project_labels_into_image(F, camera_calibration)
        G_new = project_labels_into_image(G, camera_calibration)
        H_new = project_labels_into_image(H, camera_calibration)

        A_new = (int(A_new[0]), int(A_new[1]))
        B_new = (int(B_new[0]), int(B_new[1]))
        C_new = (int(C_new[0]), int(C_new[1]))
        D_new = (int(D_new[0]), int(D_new[1]))
        E_new = (int(E_new[0]), int(E_new[1]))
        F_new = (int(F_new[0]), int(F_new[1]))
        G_new = (int(G_new[0]), int(G_new[1]))
        H_new = (int(H_new[0]), int(H_new[1]))

        colour = (0, 0, 255)
        cv2.line(img, A_new, B_new, colour, thickness=1)
        cv2.line(img, B_new, D_new, colour, thickness=1)
        cv2.line(img, D_new, C_new, colour, thickness=1)
        cv2.line(img, C_new, A_new, colour, thickness=1)
        cv2.line(img, A_new, E_new, colour, thickness=1)
        cv2.line(img, B_new, F_new, colour, thickness=1)
        cv2.line(img, C_new, G_new, colour, thickness=1)
        cv2.line(img, D_new, H_new, colour, thickness=1)
        cv2.line(img, E_new, F_new, colour, thickness=1)
        cv2.line(img, F_new, H_new, colour, thickness=1)
        cv2.line(img, H_new, G_new, colour, thickness=1)
        cv2.line(img, E_new, G_new, colour, thickness=1)

    cv2.imwrite('./images/' + str(cnt) + '.jpg', img)

# Open a .tfrecord
filename = "{your path}/training/segment-9509506420470671704_4049_100_4069_100.tfrecord"
datafile = WaymoDataFileReader(filename)

# Generate a table of the offset of all frame records in the file.
table = datafile.get_record_table()

print("There are %d frames in this file." % len(table))

# Loop through the whole file
## and display 3D labels.
cnt = 0
for frame in datafile:
    cnt += 1
    for image_c in frame.images:
        if image_c.name == 4:
            images = image_c
            break
    for camera_c in frame.context.camera_calibrations:
        if camera_c.name == 4:
            camera_calibrations = camera_c
            break
    projection_label_into_image(cnt, camera_calibrations, images, frame.laser_labels)

and it looks like the same as before image

and then i try the suggestion 2,add the code in your func and this place(# add distortion model here if you'd like.)

# add distortion model here if you'd like.
    u_n = u_d
    v_n = v_d
    k1 = camera_calibration.intrinsic[4] 
    k2 = camera_calibration.intrinsic[5] 
    k3 = camera_calibration.intrinsic[6]   # same as p1 in OpenCV.
    k4 = camera_calibration.intrinsic[7]   # same as p2 in OpenCV
    k5 = camera_calibration.intrinsic[8]   # same as k3 in OpenCV.

    r2 = u_n * u_n + v_n * v_n
    r4 = r2 * r2
    r6 = r4 * r2

    r_d = 1.0 + k1 * r2 + k2 * r4 + k5 * r6
    kMinRadialDistortion = -1000000000 # i dont know to use what
    kMaxRadialDistortion = 1000000000  # i dont know  to use what
    if (r_d < kMinRadialDistortion or r_d > kMaxRadialDistortion):
        return False

    u_nd = u_n * r_d + 2.0 * k3 * u_n * v_n + k4 * (r2 + 2.0 * u_n * u_n)
    v_nd = v_n * r_d + k3 * (r2 + 2.0 * v_n * v_n) + 2.0 * k4 * u_n * v_n

    u_d = u_nd
    v_d = v_nd
    #

and it give me a fail image how can i fix it ? And I think there may not have such a big distortion to cause the large bias

peisun1115 commented 4 years ago

I did not spend too much time debugging your code. Lets wait for our release of camera model code.

xmyqsh commented 4 years ago

@hhhmoan @peisun1115 Besides the distortion or rolling shutter or camera model projection problem, is it possible cased by the misalign of timestamp between lidar and camera? @peisun1115 What is the precision of the synchronous between lidar and camera? If the synchronous cannot be guaranteed, it is best to stamp the lidar and camera separately.

peisun1115 commented 4 years ago

No, it is not possible to be caused by the synchronization as long as the frame.projected_laser_label is correct as that is computed without any additional information other than what is public in the dataset.

xmyqsh commented 4 years ago

@peisun1115

285   // Lidar labels (laser_labels) projected to camera images. A projected
286   // label is the smallest image axis aligned rectangle that can cover all
287   // projected points from the 3d lidar label. The projected label is ignored if
288   // the projection is fully outside a camera image. The projected label is
289   // clamped to the camera image if it is partially outside.
290   repeated CameraLabels projected_lidar_labels = 9;

This projection algorithm described above is only to project the 3d lidar label to the 2d camera image by constraining and cutting them by the camera frustum planes. It doesn't compare the projected 3d lidar label with the 2d camera label to guarantee something. So, it cannot prove the synchronization.

peisun1115 commented 4 years ago

what i was saying is that if projected_lidar_labels looks fine but your result is not, then it is not because of synchronization. You can assume that lidar and camera are well synchronized

turboxin commented 4 years ago

hi @peisun1115, I wonder when would it be possible to release camera model code. I'm having trouble with distortion when projection 3d box or point cloud to image, it seems that when distortion model is applied using https://github.com/waymo-research/waymo-open-dataset/issues/24#issuecomment-535794541, the point behind camera would be projected to image, but the projection looks fine without applying distortion model. I suspect it has something to do with kMinRadialDistortion or kMaxRadialDistortion, could you tell me the value of these two parameters? Thanks a lot!

peisun1115 commented 4 years ago

I cannot promise when but it is close :)

peisun1115 commented 4 years ago

@turboxin @hhhmoan @xmyqsh and others, we have camera model released in third_party/camera directory. You can see an example usage at

https://github.com/waymo-research/waymo-open-dataset/blob/master/third_party/camera/ops/camera_model_ops_test.py

C++ interface is at https://github.com/waymo-research/waymo-open-dataset/blob/master/third_party/camera/camera_model.h

pwais commented 4 years ago

@peisun1115 Please voice to your Director / TLM that we advocate for real, patent-free, consensus-respecting Apache 2.0 release of code for the purpose of advancing science. There's ample precedent that limited releases like these inevitably lead to adversarial situations that are of interest to only lawyers, e.g. https://blog.fossa.io/dont-over-react-to-the-facebook-patents-license-629f708f2221

The React community is relatively innocuous, but If you're an employee at a non-Waymo robotics company and you execute or even just read the camera model code cited above, your actions will likely be used against you if you become a target of litigation from Waymo, who has a self-demonstrated bias for litigation. https://github.com/waymo-research/waymo-open-dataset/blob/0b2b8cdc69a028b40a98a8719abeba7367420e8b/third_party/camera/PATENTS#L12

Since this camera model code isn't used in any of the main repo code or even the tutorial, it would make a lot more sense to release it in a standalone repository, especially in light of the distinctive terms Waymo Legal has chosen for it.

peisun1115 commented 4 years ago

Hi Paul - Thank you for you and your company's interest in using the code accessible at https://github.com/waymo-research/waymo-open-dataset/third_party/camera.

At this time, the only licenses available for that code is the License + Additional IP Rights Grant (Patents), which you cited in your email, both of which are co-located with the code in the folder above. Those licenses are consistent with using that code to process data from the Waymo Open Dataset as authorized by and in compliance with the Waymo Dataset License Agreement for Non-Commercial Use.

We are discussing other licenses we could offer for the Dataset (and code above) based on feedback on potential use cases. If you would like us to consider offering a license that would enable a particular use case you're interested in, please let us know more about your use case. Thank you.

pwais commented 4 years ago

@peisun1115 I'm sorry it wasn't clear, but the use case is to simply be able to clone this repository without risking litigation from Waymo. This repo is advertised as Apache 2.0, but in actuality there is a separate and very different agreement covering third_party/camera. If an individual (especially an employee of a competitor) simply clones this repo, the mere log of that cloning activity may be used as evidence to warrant litigation, discovery, and further adverse action (whether a crime has been committed or not). Since lidar-camera sync is a very common in AV systems, the likelihood of the appearance of infringement is high.

In Waymo vs. Uber, Waymo used the mere fact that Levandowski copied patented files to his personal device to warrant extensive litigation; however, towards the end of the case, a large amount of claimed IP theft was invalidated. The point is that merely cloning this repo makes the downloading party vulnerable to the possibility of adverse and expensive legal action, whether the party infringes or not.

For further discussion and context, please see how Facebook's choice to amend OSI licenses has obstructed the composability of its contributions: https://meshedinsights.com/2017/07/16/apache-bans-facebooks-license-combo/

You should move third_party/camera to a separate repo and have this repo pull in that code on an as-needed basis. It's best for the community if a repo advertised as Apache 2 is very simply Apache 2. Better yet, just donate the patents to the public.