kylevedder / zeroflow

Official repository for ZeroFlow: Scalable Scene Flow via Distillation
http://vedder.io/zeroflow
MIT License
16 stars 1 forks source link

Map features seem to be missing from Waymo Open Dataset v 1.4.2 #6

Closed deeptibhegde closed 7 months ago

deeptibhegde commented 7 months ago

Hello, I downloaded WOD version 1.4.2 linked here but map features seem to be missing from this data. I specifically downloaded files from the path waymo_open_dataset_v_1_4_2/individual_files/* . When running rasterize_heightmap.py, I get AttributeError: map_features. Please let me know if I am using the wrong data.

kylevedder commented 7 months ago

You are likely running something wrong or have corrupted data. I just validated the data I downloaded from that bucket is correct with the following steps:

1) Pull down a single standalone file:

gsutil -m cp gs://waymo_open_dataset_v_1_4_2/individual_files/training/segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord /efs/waymo_open_142_test/training/

The md5sum of the file is ed4d372c8ccbcc0e5430e82c7c92e002

The resulting file tree is

/efs/waymo_open_142_test/
└── training
    └── segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord

2) Run the rasterization script. Inside ./launch_waymo.sh

root@shaperotator:/project/data_prep_scripts/waymo# python rasterize_heightmap.py /efs/waymo_open_142_test/ /efs/waymo_open_142_test_raster_heights/
Waymo directory: /efs/waymo_open_142_test
Output directory: /efs/waymo_open_142_test_raster_heights
Work queue size: 1
Processing /efs/waymo_open_142_test/training/segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord
Saving heightmap to /efs/waymo_open_142_test_raster_heights/training/segment-10017090168044687777_6380_000_6400_000_with_camera_labels_map/ground_height.npy
deeptibhegde commented 7 months ago

Hi, Would you mind checking on the most recent version of the bucket (reading just the standalone file, for example). When I pull the single file, I get the md5sum as ed4d372c8ccbcc0e5430e82c7c92e002. However, when I check the frame attributes, the map_features seem to be missing, here is my test code:

import tensorflow as tf 

  from waymo_open_dataset import dataset_pb2

  file_path = "/mnt/store/dhegde1/data/AV_datasets/waymo/test/individual_files_training_segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord"

  dataset = tf.data.TFRecordDataset(file_path, compression_type='')

  for data in dataset:
      frame = dataset_pb2.Frame.FromString(bytearray(data.numpy()))
      break

  print(dir(frame))
  print(frame.map_features)

I get

2024-02-04 15:56:29.681174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2024-02-04 15:56:31.267738: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2024-02-04 15:56:31.323464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:4f:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.323917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: pciBusID: 0000:52:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.324463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties: pciBusID: 0000:53:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.325304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties: pciBusID: 0000:56:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.325964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 4 with properties: pciBusID: 0000:57:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.326359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 5 with properties: pciBusID: 0000:ce:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.326740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 6 with properties: pciBusID: 0000:d1:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.327123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 7 with properties: pciBusID: 0000:d2:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.327505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 8 with properties: pciBusID: 0000:d5:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.327885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 9 with properties: pciBusID: 0000:d6:00.0 name: NVIDIA RTX A6000 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s 2024-02-04 15:56:31.327925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2024-02-04 15:56:31.334457: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2024-02-04 15:56:31.334583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2024-02-04 15:56:31.335935: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2024-02-04 15:56:31.336313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2024-02-04 15:56:31.336927: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2024-02-04 15:56:31.338057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2024-02-04 15:56:31.338318: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory 2024-02-04 15:56:31.338335: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2024-02-04 15:56:31.338865: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-04 15:56:31.345986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2024-02-04 15:56:31.346046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 2024-02-04 15:56:31.388884: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2024-02-04 15:56:31.389768: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2200000000 Hz ['ByteSize', 'Clear', 'ClearExtension', 'ClearField', 'CopyFrom', 'DESCRIPTOR', 'DiscardUnknownFields', 'Extensions', 'FindInitializationErrors', 'FromString', 'HasExtension', 'HasField', 'IsInitialized', 'ListFields', 'MergeFrom', 'MergeFromString', 'ParseFromString', 'RegisterExtension', 'SerializePartialToString', 'SerializeToString', 'SetInParent', 'UnknownFields', 'WhichOneof', '_CheckCalledFromGeneratedFile', '_SetListener', '__class__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '_extensions_by_name', '_extensions_by_number', 'camera_labels', 'context', 'images', 'laser_labels', 'lasers', 'no_label_zones', 'pose', 'projected_lidar_labels', 'timestamp_micros'] Traceback (most recent call last): File "/mnt/store/dhegde1/code/LiDARpretraining/scene_flow/zeroflow/data_prep_scripts/waymo/test_tfrecord.py", line 19, in <module> print(frame.map_features) AttributeError: map_features

kylevedder commented 7 months ago

The tfrecord downloaded had the same hash, which means the issue is with the code loading the tfrecord.

Did you run this in the docker environment we provided or are you using a different environment? I would bet you're using a different (possibly newer version of the waymo open API which expects different data). If you replicate my exact steps locally you should be able to reproduce my correctly working code.

deeptibhegde commented 7 months ago

Changed my waymo-open-dataset version to the one in the Waymo Docker -- docker/Dockerfilewaymo -- and that fixed it. Thanks for your time.

deeptibhegde commented 7 months ago

Apologies, re-opening due to another issue while running data_prep_scripts/waymo/extract_flow_and_remove_ground.py. It seems as the point_flow dict is empty, and the code never enters this line.

This results in a key error

File "/mnt/store/dhegde1/code/LiDARpretraining/scene_flow/zeroflow/data_prep_scripts/waymo/extract_flow_and_remove_ground.py", line 161, in convert_range_image_to_point_cloud flow = point_flows[c.name][ri_index] KeyError: 1

Would this be another issue with the version? I have duplicated your environment as exact as I can.

kylevedder commented 7 months ago

I don't know as I don't know what your environment is. Please actually run our docker container instead of making your own environment; there's lots of version compatibility issues, particularly when it comes to the waymo TF stuff, but I believe my code works with my docker image.

That said, I have not touched the preprocessing code for waymo in a year, and it's possible there's some issue with it. If you run it in our docker image and still have issues, I'd be more than happy to take a look at a PR fixing the issues.

deeptibhegde commented 7 months ago

Did a work-around where I skipped loading the flow gt, since I am only interested in getting the flow estimates during inference. However, it seems as though the flow being estimated is very close to zero and always <1m. In visualizing the ground truth target point cloud sequence, these values seem to be much lower than they should be. In particular, I am evaluating zeroflow_weights-master/waymo/nsfp_distilatation/nsfp_distilatation.ckpt on Waymo. I have included a custom visualization below, where blue points are from pc0, green points are from pc1, and red points are pc0 warped to pc1

zf_demo

As you can see, the warped point cloud from the flow estimate has a very large overlap with the scene at t=0. Is this in line with the general performance of the model or is this unusual?

kylevedder commented 7 months ago

The magnitude of the flow is m / 0.1 seconds, so a 1m length vector is 10m/s of motion.

I don't know what exactly you did for your visuals but it looks like everything is in an ego motion compensated global coordinate frame, so I would expect the vast majority of points to be almost 0 flow.

These methods are also shockingly bad in general on objects that aren't large (e.g. pedestrians), as we discuss in more detail in this blog post for our soon to be released scene flow challenge:

https://www.argoverse.org/sceneflow

deeptibhegde commented 7 months ago

[Edited previous comment to correct color label assignment]

I have just saved the flow output from the network along with pc0_points and pc1_points. Is there a way to obtain "true" flow without ego motion compensation? It seems as though the flow estimate is quite bad here even for the larger Car category.

Also, which model from https://github.com/kylevedder/zeroflow_weights is associated with that of the unsupervised ZeroFlow method for Waymo?

kylevedder commented 7 months ago

You can manually add back in egomotion for all of the flow vectors if you want by using the relative SE3 transform post-hoc.

There's no way to load without egomotion compensation just from the configs. To do that, you'd have to modify the load call on the Waymo dataloader to feed a start_idx equal to the requested idx .

All that said, I don't see why you'd want to do this. One of the takeaways from Re-Evaluating LiDAR Scene Flow for Autonomous Driving is that egomotion compensation is very valuable for improving method performance (which makes sense -- the method doesn't have to jointly associate across egomotion and non-egomotion), and none of the ZeroFlow methods were trained to run without egomotion compensation.

The nsfp_distilatation weights are the ZeroFlow weights.

deeptibhegde commented 7 months ago

I don't know what exactly you did for your visuals but it looks like everything is in an ego motion compensated global coordinate frame, so I would expect the vast majority of points to be almost 0 flow.

But in the example pictured above, it is clear that the car object should have a much larger flow estimate. My original question was if this was the expected performance of the model.

deeptibhegde commented 7 months ago

Examining other results, it seems to align with the qualitative performance reported in the paper and with that of the argoverse demo. Closing, thanks for your time.