pwais commented 4 years ago

Issue: waymo-open-dataset requires Eager execution

The official tutorial uses eager mode: https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb

But it doesn't actually make clear that eager mode is unfortunately required. Attempting to use the code in this python package *without eager mode can result in crashes, e.g.:

    def parse_range_image_and_camera_projection(frame):
      ...
      range_image_top_pose = None
      for laser in frame.lasers:
        if len(laser.ri_return1.range_image_compressed) > 0:  # pylint: disable=g-explicit-length-test
          range_image_str_tensor = tf.decode_compressed(
              laser.ri_return1.range_image_compressed, 'ZLIB')
          ri = dataset_pb2.MatrixFloat()
>         ri.ParseFromString(bytearray(range_image_str_tensor.numpy()))
E         AttributeError: 'Tensor' object has no attribute 'numpy'

This is very sad, because a ton of Tensorflow code (e.g. that in tensorflow/models, tensorflow/tpu, other 3rd party repos, etc etc) would need major changes in order to work with TF Eager execution. While Eager is the default for TF 2.0, public support for TPUs in TF 2.0 is not yet complete ( https://github.com/tensorflow/tensorflow/issues/24412 ).

In the interest of having a library that is as composable as those written for other datasets (e.g. NuScenes, Lyft Level 5, and Argoverse), this is a request for waymo-open-dataset to follow Tensorflow's own recommendation that code support both eager and graph execution. Moreover, there is a large set of functionality in this repo that does not require Tensorflow at all; this is a further request to isolate away the bits that actually require Tensorflow (e.g. the C++ metrics).

Workaround

In the interim, users can use a py_func workaround (see below).

This is a demo of getting all lidar points, fused, using waymo-open-dataset code confined to its own tf.Session:

def get_fused_cloud_in_ego(waymo_frame):

  import tensorflow as tf
  import tensorflow.contrib.eager as tfe

  def get_all_points(wf_str):
    assert tf.executing_eagerly()

    from waymo_open_dataset import dataset_pb2 as open_dataset
    wf = open_dataset.Frame()
    wf.ParseFromString(wf_str.numpy())

    from waymo_open_dataset.utils import frame_utils
    range_images, camera_projections, range_image_top_pose = (
      frame_utils.parse_range_image_and_camera_projection(wf))

    # Waymo provides two returns for every lidar beam; we have to
    # fuse them manually
    points, cp_points = frame_utils.convert_range_image_to_point_cloud(
        wf,
        range_images,
        camera_projections,
        range_image_top_pose)
    points_ri2, cp_points_ri2 = frame_utils.convert_range_image_to_point_cloud(
        wf,
        range_images,
        camera_projections,
        range_image_top_pose,
        ri_index=1)

    # 3d points in vehicle frame.
    points_all = np.concatenate(points, axis=0)
    points_all_ri2 = np.concatenate(points_ri2, axis=0)
    all_points = np.concatenate((points_all, points_all_ri2), axis=0)
    return all_points

  # Run the `get_all_points()` helper in its own session to confine the Eager mode scope
  assert not tf.executing_eagerly()
  with tf.Session() as sess:
    wf = tf.placeholder(dtype=tf.string)
    pf = tfe.py_func(get_all_points, [wf], tf.float32)
    all_points = sess.run(pf, feed_dict={wf: waymo_frame.SerializeToString()})
    return all_points

peisun1115 commented 4 years ago

@pwais

The data is in proto format. It is not recommended to directly it consume it in your ML model. I think a common practice is to transform the data to the format your model work with. For example tensorflow.Example if tensorflow is your ML framework.

Apart from frame_utils.py that works directly with the proto. Other tf code in the lib does not assume eager mode.

pwais commented 4 years ago

Yup, I imagine most users will have some sort of ETL pipeline, and that pipeline could very well involve TF Graphs (e.g. run deep net inference on Waymo Frames, then record the result). It's very well-documented within the Tensorflow Github issues that enabling Eager mode will very often cause TF Graph code to have unexpected errors (browse e.g. https://github.com/tensorflow/tensorflow/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+eager ). While Eager is enabled by default in Tensorflow 2.0, at the time of writing there are still a large number of outstanding bugs that prohibit projects from upgrading.

Either way, the issue remains that TF Eager is required to, for example, decode the Waymo lidar range images (e.g. to create BEV feature vectors). Since TF Eager by default is irreversible when enabled globally, frame_utils.py leaks this quite intrusive constraint.

The purpose of this Github issue is to:

(1) request that the dependency on TF Eager be removed (which is in accordance with Tensorflow team recommendations for best practices for libraries)
(2) spread awareness of the understated dependency to any new user considering this library, as this constraint has a large impact on how any user might use the data and library

By the way, if Waymo did not intend for users to use the released data directly in a training pipeline, it would behoove Waymo to use some data format that is not TFRecords for distribution. While it's unfortunate that Google has not open sourced python/c++ RecordIO (despite Protobuf being open source for now over a decade), there are other options that facilitate significantly more efficient data access with lighter dependencies, including:

pyarrow / Pandas
a tarball of Frame files, one string-serialized Frame per file

In particular, the latter tarball option would not just remove the dependency on tensorflow but also enable efficient random seeking over the Frame data (e.g. using the Python built-in tarfile library), which TFRecords notoriously prohibit; this option would be a strict improvement over the current release format.

peisun1115 commented 4 years ago

I think it is not too much trouble to write a binary (say a apache beeam job) that transforms a proto to a format you prefer. Within that binary, you can enable tf eager.

We prefer to providing a generic format that can be easily transformed to other format.

pwais commented 4 years ago

Other bugs observed using TF Eager (Tensorflow 1.14):

Execution is incredibly slow. On a modern Core i7, runtime of the above function (i.e. the Session.run() call) is 4-8 seconds. I also tried using a GPU (Nvidia 1070 and execution time is on average the same or 1 second faster.
TF Eager completely ignores the session config, which may (for example) disable GPUs or force GPU usage to use memory growth. One needs to do os.environ['CUDA_VISIBLE_DEVICES'] = '' prior to even importing TF Eager. This procedure is essentially necessary for any sort of multi-threaded usage (e.g. Spark or Beam).

atyshka commented 4 years ago

Any updates on this nearly a year later? I can't believe the challenge came and went and this still hasn't been fixed

pwais commented 3 years ago

It would be tremendously helpful if any future data releases were not in TFRecords format because:

TFRecords prohibit seeking to individual frames; you'd have to read the whole file and pre-index it if you want to seek.
A TFRecords file is literally a concatenation of size-delimited string-serialized protobuf messages. A tar file is a concatenation of index-delimited files. The latter supports random access, compression, and almost every platform today can read tar files... TFRecords offers none of these features.
Much of modern research today uses pytorch, jax, or something that's not Tensorflow. The winners of your challenge used pytorch: https://github.com/open-mmlab/OpenPCDet . TFRecords, tf.data, and the ops for reading protobuf data efficiently are all tensorflow specific.

The recent release of KITTI-360 shows yet another example of a AV dataset that uses simple flat files (and zip files for distribution) ... not TFRecords http://www.cvlibs.net/datasets/kitti-360/documentation.php

atyshka commented 3 years ago

@pwais Looking at this a year later, since this is an early preprocessing layer, couldn't you just run this part in eager and wrap all your other graph code to a tf.function?

pwais commented 3 years ago

@atyshka No! Not using Tensorflow as a framework in this context, and definitely not the version that waymo-open-dataset requires. I totally understand that some people are fine with Tensorflow eager and such, or are just running in a notebook or something. But a lot of people do not, and in those settings this project's dependencies are obnoxious.

atyshka commented 3 years ago

Yeah I agree that this is too rigidly dependent on tensorflow, but things have changed a little. A year or so in to TF 2.0 most maintained projects such as object detection API are fully 2.0 and eager compatible. Still not friendly for non-tensorflow users though, I agree it would be nice to minimize dependency

pwais commented 3 years ago

@atyshka A key problem, which I believe is still here, is that the reference code in this repo for decoding point clouds from the TFRecords is complicated and thus essentially requires Tensorflow. If you look at NuScenes, Argoverse, KITTI, KITTI-360, Lyft Level 5, TRI-ML/DDAD, etc etc -- NONE of these datasets require a giant dependency like Tensorflow for simply reading point cloud data (into a canonical XYZ cloud from their RV encoding). Moreover, when I tried isolating the code in this repo into a python function, it took 7 seconds per cloud on a single CPU to decode. One of the authors gave the (rather snarky) comment that I should use a GPU to decode the lidar data.

While one could say all this is just unfortunate, I think it's important to raise the bar (or at least meet the bar) when introducing yet another AV dataset. Most of the aforementioned datasets were out before this one, and not a single one of those requires Tensorflow, nor patented code (in third_party/camera in this repo-- careful not to look at that directory at work).

pwais commented 3 years ago

Again, an ideal mitigation would be simply to remove the dependency on Tensorflow. The winner of the Open Dataset Detection challenge uses pytorch: https://github.com/open-mmlab/OpenPCDet They basically had to rewrite most of the code in the waymo-od repo: https://github.com/open-mmlab/OpenPCDet/blob/594962844729353842207b9e3c50f8b2434484f9/pcdet/datasets/waymo/waymo_dataset.py#L19

Does Waymo actually care? The resounding answer here is a very Google-like "we're sorry you feel that way"

atyshka commented 3 years ago

I think we have a solution! I've been able to parse the protos without eager mode by using tf.io.decode_proto. Please see this colab notebook for reference on how to do it: https://colab.research.google.com/drive/1ysVjnq0g5lsRMd2bCzB_nK3v3fF1S5g7?usp=sharing

@peisun1115 Would you accept this change to the repository as a PR? I can't think of any negative effects this method would bring, and I think it would make a lot of people happy. I'd like to check though before putting in the effort

peisun1115 commented 3 years ago

Thanks for contributing. I do not have access to your colab. You are welcome to send a PR. We can accept it as long as it doesn't require much maintenance effort and benefits the community. Alternatively, if it is easier for you, you can create a separate repo to host it if you like. We are happy to mention it somewhere in this repo so that users can find it.

atyshka commented 3 years ago

Apologies for the difficulty accessing. My modifications are as follows:

protoc waymo_open_dataset/dataset.proto --descriptor_set_out="dataset.desc" --include_imports This command must be run, and could be added to the compilation docker process.

My simple example for parsing images:

FILENAME = '/content/waymo-od/tutorial/frames'
dataset = tf.data.TFRecordDataset(FILENAME, compression_type='')

@tf.function
def decode_images_from_frame(data):
  deserialized_cams = tf.io.decode_proto(
    data, "waymo.open_dataset.Frame", ["images"], [tf.string], descriptor_source='/content/waymo-od/dataset.desc')

  return tf.io.decode_proto(
    deserialized_cams[1][0], "waymo.open_dataset.CameraImage", ["image"], [tf.string], descriptor_source='/content/waymo-od/dataset.desc')[1][0]

for data in dataset:
  deserialized_images = decode_images_from_frame(data)
  decoded_images = [tf.io.decode_jpeg(tf.reshape(im, ())) for im in tf.unstack(deserialized_images)] 
  for im in decoded_images:
    plt.figure()
    plt.imshow(im.numpy())

This would need to be expanded to recursively cover all the fields in the protobuf message.

The only problem is I believe it would have to be a breaking change, because some functions such as parse_range_image_and_camera_projection() expect Frame objects whereas this approach directly converts serialized protos to tensors instead of hierarchical objects. A possible solution to avoid breaking changes would be providing alternate versions of the affected functions, I.e. one function accepting old Frame objects and another accepting new TensorFrame objects.

waymo-research / waymo-open-dataset

Support for non-eager execution (with noted workaround) #47

Issue: waymo-open-dataset requires Eager execution

Workaround