waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.7k stars 611 forks source link

Projecting from lidar point cloud doesn't match with provided Camera Projections #146

Open ltskv opened 4 years ago

ltskv commented 4 years ago

Hi, I was trying to project the lidar point clouds from the top lidar into images by using the Camera Model code from third_party/camera. When doing so, the projected pixel coordinates differ from the ones in RangeImage.camera_projection_compressed by about 5px for the front-facing camera and about 10px for the sideways cameras. I have two questions:

peisun1115 commented 4 years ago

@ltskv

They should be identical.

  1. Yes, something like matmul(pose.transform[0:3, 0:3]), points) + pose.transform[0:3, 3]

  2. no. The data should be sufficient. We produced the projection with the exactly same code.

If you have an example to reproduce your error, i can help with debugging.

ltskv commented 4 years ago

Thanks for your time. Here is a minimal example demonstrating my issue (I couldn't find the camera ops binary in the wheel that you provide so I compiled it myself and put it into waymo_camera_ops directory):

import tensorflow as tf

from waymo_open_dataset import dataset_pb2
from waymo_open_dataset.utils import frame_utils as fu
from waymo_camera_ops.py_camera_model_ops import world_to_image

C_FRONT = 1

if __name__ == '__main__':
    ds = tf.data.TFRecordDataset(
        'segment-14818835630668820137_1780_000_1800_000_with_camera_labels.tfrecord',
        compression_type=''
    )
    frame = dataset_pb2.Frame()
    frame.ParseFromString(next(iter(ds)).numpy())

    ris, cps, pps = fu.parse_range_image_and_camera_projection(frame)
    P, cp = fu.convert_range_image_to_point_cloud(frame, ris, cps, pps, 0)

    P = P[0]
    cp = cp[0]
    P_front = P[cp[..., 0] == C_FRONT]
    cp_front = cp[cp[..., 0] == C_FRONT]

    pose_tf = tf.reshape(tf.constant(frame.pose.transform), [4, 4])
    P_world = tf.einsum('ij,nj->ni', pose_tf[:3, :3], P_front) + pose_tf[:3, 3]

    cc = next(c for c in frame.context.camera_calibrations
              if c.name == C_FRONT)
    extrinsic = tf.reshape(tf.constant(cc.extrinsic.transform), [4, 4])
    intrinsic = tf.constant(cc.intrinsic)
    metadata = tf.constant([cc.width, cc.height, cc.rolling_shutter_direction])

    img = next(im for im in frame.images if im.name == C_FRONT)
    camera_image_metadata = tf.constant([
        *img.pose.transform,
        img.velocity.v_x, img.velocity.v_y, img.velocity.v_z,
        img.velocity.w_x, img.velocity.w_y, img.velocity.w_z,
        img.pose_timestamp,
        img.shutter,
        img.camera_trigger_time,
        img.camera_readout_done_time
    ], dtype=tf.float32)

    test = P_world[1000]
    groundtruth = cp_front[1000]

    proj = world_to_image(
        extrinsic, intrinsic, metadata, camera_image_metadata, test[None]
    )
    print(f'Projection: {proj[0, :2].numpy()}')
    print(f'Groundtruth: {groundtruth[1:3]}')
    # Prints:
    # Projection: [1467.3871  555.06  ]
    # Groundtruth: [1469  554]

    proj_all = world_to_image(
        extrinsic, intrinsic, metadata, camera_image_metadata, P_world
    )
    proj_err = tf.reduce_mean(tf.linalg.norm(
        proj_all[..., :2] - cp_front[..., 1:3], axis=-1
    ))

    print(f'Average Projection Error (all points): {proj_err} px.')
    # Prints:
    # Average Projection Error (all points): 5.581679344177246 px.
peisun1115 commented 4 years ago

That looks correct me to. Do you think you can try one experiment? Replace all float with double or float64. DT_FLOAT with DT_DOUBLE ?

I was using the c++ api directly and was using double.

ltskv commented 4 years ago

Good hint. I added dtype=tf.float64 to all tf.constant operations in the script above and also changed the code in third_party/camera/ops/camera_model_ops.cc so that the tensorflow op accepts float64 inputs instead of float32 (see fcdcf365c24db655342469ed33a60be73e08bef0). After these changes, the results are as following:

Groundtruth: [1469  554]
Projection: [1469.62737889  554.35121825]

Average Projection Error (all points): 1.4265301774871821 px.
Max Projection Error: 3.7833056905416678 px.

I also tried to tweak the code in utils, so that all data is converted to float64 as early as possible; and in compute_range_image_cartesian the coordinates are not projected from World to Vehicle (since it's needed to project them back into World and this might be the source of precision loss), but the projection results didn't change significantly. Would you have any suggestions how to eliminate the error entirely?

peisun1115 commented 4 years ago

@ltskv Thanks for trying it. I think the remaining "error" you are seeing is because of rounding error. What we provide in RangeImage.camera_projection_compressed are rounded integers (std::floor) while you are using float points as your final result. Can you help to try that?

ltskv commented 4 years ago

The problem is, if I take another testing point, then the results are e.g.:

Projection: [76.1873 533.39]
Groundtruth: [73 533]

and the difference here is too large to be explained by rounding.

ltskv commented 4 years ago

So if I floor (using tf.floor) the projections obtained with my script, then I get the following error distribution:

projection_stats

ltskv commented 4 years ago

And here is how the points get projected into the image (the green dots are from RangeImage.camera_projection_compressed and the red ones are those projected from the point cloud):

projection_test

peisun1115 commented 4 years ago

@ltskv Thanks for the analysis. I've taken another look and played on my own. I found that there is another source of rounding error when I construct the range image. It should be very small in LiDAR space (cartesian vehicle frame). But it can be a couple of pixels in images in the worst case as you can see which is totally expected as image has much higher resolution than LiDAR.

peisun1115 commented 4 years ago

btw, do you see any obvious errors after projection?

ltskv commented 4 years ago

@ltskv Thanks for the analysis. I've taken another look and played on my own. I found that there is another source of rounding error when I construct the range image. It should be very small in LiDAR space (cartesian vehicle frame). But it can be a couple of pixels in images in the worst case as you can see which is totally expected as image has much higher resolution than LiDAR.

Do you mean this rounding error is due to the lidar coordinates being encoded by their pixel positions in the range image? If this is the case, then I guess there's no way to eliminate the remaining discrepancy without changing the dataset format. Would you then say that the data in RangeImage.camera_projections_compressed is more accurate and "true" than what I obtain when projecting from lidar, or does it have an uncertainty of few pixels, so that for the practical purposes both ways of obtaining the projections are equally good?

ltskv commented 4 years ago

btw, do you see any obvious errors after projection?

Now with the error largely eliminated, I don't see or expect there to be more than a few pixel difference with the RangeImage.camera_projections_compressed, so if there are any obvious errors, they will probably be visible when using both projection methods. I have seen some artifacts on object boundaries and also some rather random artifacts (see image below as an example), but I guess this is a common problem in all datasets.

projection_problems_copy (the lidar points are colored by depth, brighter is further)

bilalsattar commented 4 years ago

@peisun1115 can you please merge the changes in the final build and make it available in pip package.

peisun1115 commented 4 years ago

@bilalsattar Will do. i do not think that blocks you from doing anything with it. A couple of pixels of error is very small as image is much denser than LiDAR.

YurongYou commented 4 years ago

Thanks for your time. Here is a minimal example demonstrating my issue (I couldn't find the camera ops binary in the wheel that you provide so I compiled it myself and put it into waymo_camera_ops directory):

import tensorflow as tf

from waymo_open_dataset import dataset_pb2
from waymo_open_dataset.utils import frame_utils as fu
from waymo_camera_ops.py_camera_model_ops import world_to_image

C_FRONT = 1

if __name__ == '__main__':
    ds = tf.data.TFRecordDataset(
        'segment-14818835630668820137_1780_000_1800_000_with_camera_labels.tfrecord',
        compression_type=''
    )
    frame = dataset_pb2.Frame()
    frame.ParseFromString(next(iter(ds)).numpy())

    ris, cps, pps = fu.parse_range_image_and_camera_projection(frame)
    P, cp = fu.convert_range_image_to_point_cloud(frame, ris, cps, pps, 0)

    P = P[0]
    cp = cp[0]
    P_front = P[cp[..., 0] == C_FRONT]
    cp_front = cp[cp[..., 0] == C_FRONT]

    pose_tf = tf.reshape(tf.constant(frame.pose.transform), [4, 4])
    P_world = tf.einsum('ij,nj->ni', pose_tf[:3, :3], P_front) + pose_tf[:3, 3]

    cc = next(c for c in frame.context.camera_calibrations
              if c.name == C_FRONT)
    extrinsic = tf.reshape(tf.constant(cc.extrinsic.transform), [4, 4])
    intrinsic = tf.constant(cc.intrinsic)
    metadata = tf.constant([cc.width, cc.height, cc.rolling_shutter_direction])

    img = next(im for im in frame.images if im.name == C_FRONT)
    camera_image_metadata = tf.constant([
        *img.pose.transform,
        img.velocity.v_x, img.velocity.v_y, img.velocity.v_z,
        img.velocity.w_x, img.velocity.w_y, img.velocity.w_z,
        img.pose_timestamp,
        img.shutter,
        img.camera_trigger_time,
        img.camera_readout_done_time
    ], dtype=tf.float32)

    test = P_world[1000]
    groundtruth = cp_front[1000]

    proj = world_to_image(
        extrinsic, intrinsic, metadata, camera_image_metadata, test[None]
    )
    print(f'Projection: {proj[0, :2].numpy()}')
    print(f'Groundtruth: {groundtruth[1:3]}')
    # Prints:
    # Projection: [1467.3871  555.06  ]
    # Groundtruth: [1469  554]

    proj_all = world_to_image(
        extrinsic, intrinsic, metadata, camera_image_metadata, P_world
    )
    proj_err = tf.reduce_mean(tf.linalg.norm(
        proj_all[..., :2] - cp_front[..., 1:3], axis=-1
    ))

    print(f'Average Projection Error (all points): {proj_err} px.')
    # Prints:
    # Average Projection Error (all points): 5.581679344177246 px.

Hi Itskv,

Could you briefly explain how to compile and install the third_party module? I tried bazel build third_party/... and copy the bazel-bin/third_party/camera/ops/camera_model_ops.so into the third_party/camera/ops folder, but it seems I still cannot load it. It complains "ImportError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory". Thank you so much!

Best, Yurong

peisun1115 commented 4 years ago

i have compiled it here https://pypi.org/project/waymo-open-dataset-tf-2-2-0/

to compile by yourself:

Follow this: https://github.com/waymo-research/waymo-open-dataset/tree/master/pip_pkg_scripts

the third_party has been added to the pip on master already. https://github.com/waymo-research/waymo-open-dataset/blob/master/pip_pkg_scripts/build_pip_pkg.sh#L56

YurongYou commented 4 years ago

i have compiled it here https://pypi.org/project/waymo-open-dataset-tf-2-2-0/

to compile by yourself:

Follow this: https://github.com/waymo-research/waymo-open-dataset/tree/master/pip_pkg_scripts

the third_party has been added to the pip on master already. https://github.com/waymo-research/waymo-open-dataset/blob/master/pip_pkg_scripts/build_pip_pkg.sh#L56

Hi Pei,

Thanks! I tried reinstalling via pip install waymo-open-dataset-tf-2-2-0, but when I from waymo_open_dataset.camera.ops import camera_model_ops, it still complains ImportError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory. Should I set some environment variable?

peisun1115 commented 4 years ago

Hey, I don't see it either.. i am not sure. Maybe i forgot to include? I remember i tried it in the past. you can just try the command here https://github.com/waymo-research/waymo-open-dataset/tree/master/pip_pkg_scripts . It builds quite fast.

I will fix the pre-compiled package probably sometime next week.

YurongYou commented 4 years ago

Hey, I don't see it either.. i am not sure. Maybe i forgot to include? I remember i tried it in the past. you can just try the command here https://github.com/waymo-research/waymo-open-dataset/tree/master/pip_pkg_scripts . It builds quite fast.

I will fix the pre-compiled package probably sometime next week.

Hi Pei, I just tried recompiled it using this command, but it seems missing still. I compiled it under a conda virtual environment.

commands I used:

export GITHUB_BRANCH=master
export PYTHON_VERSION=3
export PYTHON_MINOR_VERSION=7
export PIP_MANYLINUX2010=0
export TF_VERSION=2.2.0
./pip_pkg_scripts/build.sh
pip uninstall waymo-open-dataset-tf-2-2-0
cd /tmp/pip_pkg_build
pip install waymo_open_dataset_tf_2_2_0-1.2.0-cp37-cp37m-linux_x86_64.whl
YurongYou commented 4 years ago

I saw libtensorflow_framework.so in bazel-bin/external/local_config_tf/ and tried copying that into root dir of waymo-od as libtensorflow_framework.so.2, but it seems still not working...

peisun1115 commented 4 years ago

i believe this is resolved. the code has been updated. The problem was that we were missing init.py in the camera model dir.

Jossome commented 3 years ago

My solution:

  1. Add YOUR_PYTHON_PATH/site-packages/tensorflow to $LD_LIBRARY_PATH. libtensorflow_framework.so.2 is located there, but I'm not sure why the module couldn't find it. Maybe it's because of my conda enviroment.
  2. Copy py_camera_model_ops.py to YOUR_PYTHON_PATH/site-packages/waymo_open_dataset/camera/ops/, and create a __init__.py there.
  3. Use from waymo_open_dataset.camera.ops import py_camera_model_ops instead of camera_model_ops

This only works with package built from their docker file. I ran the test script using the installed waymo_open_dataset package and it passed.

[       OK ] CameraModelOpsTest.testCameraModel

I tried this with package installed from pip install waymo-open-dataset-tf-2-2-0, but it raises this error:

>>> from waymo_open_dataset.camera.ops import py_camera_model_ops
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:118] File already exists in database: google/protobuf/any.proto
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1367] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
[1]    17699 abort (core dumped)  python

I still have the same problem here. I tried both pip install waymo-open-dataset-tf-2-2-0 and building from scratch, but both of them raise the same error: ImportError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory

Cc-Hy commented 2 years ago

i have compiled it here https://pypi.org/project/waymo-open-dataset-tf-2-2-0/

to compile by yourself:

Follow this: https://github.com/waymo-research/waymo-open-dataset/tree/master/pip_pkg_scripts

the third_party has been added to the pip on master already. https://github.com/waymo-research/waymo-open-dataset/blob/master/pip_pkg_scripts/build_pip_pkg.sh#L56

@peisun1115 Hello, I'm using waymo_open_dataset-tf-2.4.0 1.4.1 in a conda environment. But when I use from waymo_open_dataset.camera.ops import camera_model_ops An error occurs: "dynamic module does not define module export function (PyInit_camera_model_ops)" Can you tell me how to fix this?

Cc-Hy commented 2 years ago

@peisun1115 I looked through a lot of repos regarding camera_model_ops , world-to-image, image_to_world function, but can not find a solution, can you please give me a valid one on using it? I'm looking forward to using it with a simple pip install command, just as frame_utils.py file. I'd like to try the camera-only challenge, but the projection seems to be inaccurate by simply using (u,v,d) = P * (X,Y,Z) which works for all others datasets, so is the image to camera unprojection. The projected points/3D boxes from cam frame do not match the one you provide in tfrecord file, especially in side cameras, just as pointed out by a lot of other issues before. You say you use the function world_to_image to solve this by considerring the rolling shutter effect. But this is inconvenient in actual use, which usually a simple matrix multiplication should do, just like the sensor-to-vehicle transformations. For examle, now I'd like to project a pixel(u,v,d) back into the 3D space in cam frame, which I should have directly done by inversing the intrinsic matrix P and multiplying it with (u,v,d), but now it seems difficult.