waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.65k stars 606 forks source link

Read Frame level input data error #621

Open atanasko opened 1 year ago

atanasko commented 1 year ago

Hi,

I try to use new WOD Apache Parquet format and example code

#@title Sensor data with both lidar and camera boxes

# Lazily read DataFrames for all components.
association_df = read('camera_to_lidar_box_association')
cam_box_df = read('camera_box')
cam_img_df = read('camera_image')
lidar_box_df = read('lidar_box')
lidar_df = read('lidar')

# Join all DataFrames using matching columns
cam_image_w_box_df = v2.merge(cam_box_df, cam_img_df)
cam_obj_df = v2.merge(association_df, cam_image_w_box_df)
obj_df = v2.merge(cam_obj_df, lidar_box_df)
# Group lidar sensors (left), group labels and camera images (right) and join.
df = v2.merge(lidar_df, obj_df, left_group=True, right_group=True)

# Read a single row, which contain data for all data for a single frame.
_, row = next(iter(df.iterrows()))
# Create all component objects
camera_image = v2.CameraImageComponent.from_dict(row)
lidar = v2.LiDARComponent.from_dict(row)
camera_box = v2.CameraBoxComponent.from_dict(row)
lidar_box = v2.LiDARBoxComponent.from_dict(row)

print(
    f'Found {len(lidar_box.key.laser_object_id)} objects on'
    f' {lidar.key.segment_context_name=} {lidar.key.frame_timestamp_micros=}'
)
for laser_object_id, camera_object_id, camera_name in zip(
    lidar_box.key.laser_object_id,
    camera_box.key.camera_object_id,
    camera_image.key.camera_name,
):
  print(f'\t{laser_object_id=} {camera_object_id=} {camera_name=}')

but I have error on line

_, row = next(iter(df.iterrows()))
~/dask/dataframe/groupby.py:1127: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

    >>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

    >>> .groupby(..., group_keys=True)
  func=lambda s0: s0.apply(
dask                         2023.3.1
waymo-open-dataset-tf-2.11.0 1.5.0

waymo-open-dataset-tf is local build

Since I'm not so used to the new format any advice how to resolve this?

alexgorban commented 1 year ago

Hi,

thanks for reporting this issue. I will update the v2.merge function to specify the group_keys=False to remove the warning in the coming v1.5.1 update.

It is just a future warning right? Did you get a valid row instance in the output?

atanasko commented 1 year ago

Hi @alexgorban,

Thank you for taking time to look at this!

In my case execution was terminated with error

..............................
  File "~/lidar-sensor/main.py", line 141, in read_lidar
    _, row = next(iter(df.iterrows()))
StopIteration
python-BaseException

Process finished with exit code 1
atanasko commented 1 year ago

Hi @alexgorban,

When is update v1.5.1 planned for? Any advice how to temporary resolve this problem?

Thanks in advance!

alexgorban commented 1 year ago

Hi @atanasko, the v1.5.1 just released, but unfortunately a fix for this warning didn't landed it the release.

The StopIteration likely means that dask was not able to find the dataset. The first cell in the tutorial assumes that a user specifies a valid path for the dataset (e.g. sets dataset_dir variable). You can debug this by printing value of the paths in the read function.

Here is an example how to modify the tutorial_v2 to access data directly from GCP in a colab (NOTE: it will be slower compared to the local access if you download the dataset and run the tutorial in a local Jupyter notebook):

  1. Add and run a new cell to install dependencies (see #623 )

    !pip install gcsfs waymo-open-dataset-tf-2-11-0==1.5.1
    !pip install pyarrow==10.0.0
  2. Add and run a new cell to configure GCP access:

    
    from google.colab import auth
    import gcsfs
    import google.auth
    auth.authenticate_user()

credentials, project_id = google.auth.default() fs = gcsfs.GCSFileSystem(project=project_id, token=credentials)


3. Modify and run `Initial setup` cell:
```python
from typing import Optional
import warnings
# Disable annoying warnings from PyArrow using under the hood.
warnings.simplefilter(action='ignore', category=FutureWarning)

import tensorflow as tf
import dask.dataframe as dd
from waymo_open_dataset import v2

# Google Cloud Storage Bucket
dataset_root = 'waymo_open_dataset_v_2_0_0'

context_name = '10023947602400723454_1120_000_1140_000'

def read(tag: str) -> dd.DataFrame:
  """Creates a Dask DataFrame for the component specified by its tag."""
  paths = sorted(fs.glob(f'{dataset_root}/training/{tag}/{context_name}.parquet'))
  return dd.read_parquet(paths, filesystem=fs)

Please let me know if local access in a Jupyter notebook or using data from GCP directly via the colab doesn't work.

atanasko commented 1 year ago

Hi @alexgorban,

Thanks again for your time!

I compile WOD 1.5.1 using Python 3.10 on my local machine and try again. Again my test fail :-(, but what I notice is the following.

I do not think that the problem is in the path, because my example with images only work fine, and also the following example retrieve the row:

 # @title Sensor data with both lidar and camera boxes

    # Lazily read DataFrames for all components.
    lidar_box_df = read('lidar_box')
    lidar_df = read('lidar')
    df = v2.merge(lidar_df, lidar_box_df, left_group=True, right_group=True)

    _, row = next(iter(df.iterrows()))

but this one:

  # Join all DataFrames using matching columns
    cam_image_w_box_df = v2.merge(cam_box_df, cam_img_df)
    cam_obj_df = v2.merge(association_df, cam_image_w_box_df)
    obj_df = v2.merge(cam_obj_df, lidar_box_df)
    # Group lidar sensors (left), group labels and camera images (right) and join.
    df = v2.merge(lidar_df, obj_df, left_group=True, right_group=True)

    # Read a single row, which contain data for all data for a single frame.
    _, row = next(iter(df.iterrows()))

have a problem and I still can not figure out what the problem is because I do not understand in deep the merge process, maybe you would have an idea what the problem with this example merge can be. I'm testing with the

context_name = '10017090168044687777_6380_000_6400_000'

Thanks in advance!

atanasko commented 1 year ago

Hi @alexgorban,

I just notice that the following code return the row

   association_df = read('camera_to_lidar_box_association')
    _, row = next(iter(association_df.iterrows()))
    print("stop")

when I use

context_name = '10023947602400723454_1120_000_1140_000'

but it fail when I use

context_name = '10017090168044687777_6380_000_6400_000'

I'll try to analyze this

atanasko commented 1 year ago

@alexgorban my original code also works with

context_name = '10023947602400723454_1120_000_1140_000'
Thermaloo commented 1 year ago

@alexgorban my original code also works with

context_name = '10023947602400723454_1120_000_1140_000'

Hello, atanasko. Could you tell me how to install waymo-open-dataset-tf-2-11-0==1.5.1? I ran the 'pip install waymo-open-dataset-tf-2-11-0==1.5.1' command, but it did not find this version:

ERROR: Could not find a version that satisfies the requirement waymo-open-dataset-tf-2-11-0==1.5.1 (from versions: none) ERROR: No matching distribution found for waymo-open-dataset-tf-2-11-0==1.5.1

Could you give me some tips?

atanasko commented 1 year ago

Hi @MingRuiye,

What is your Python version? I see that Waymo team already build a wheel for python 3.10. When I install it there was only version for python 3.9, so I build it from source.

Thermaloo commented 1 year ago

Hi @MingRuiye,

What is your Python version? I see that Waymo team already build a wheel for python 3.10. When I install it there was only version for python 3.9, so I build it from source.

Hi, atanasko. Thank you for your prompt reply. My Python version is 3.8. I have tried to download the packages to install , it's shown "ERROR: waymo_open_dataset_tf_2_11_0-1.5.1-py3-none-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl is not a supported wheel on this platform." My environment is ubuntu16.04 python3.8 pip23.1.2 But I have tried python 3.10 after getting your reply. It still doesn't work.

atanasko commented 1 year ago

Hi @MingRuiye,

I create new Python project and there was problem (in my case) with OpenEXR package (ldconfig was not find). I execute

# export PATH=$PATH:/usr/sbin
# pip install OpenEXR
# pip install waymo-open-dataset-tf-2-11-0==1.5.1

and all installed correctly

Thermaloo commented 1 year ago

Could you tell me about the environment you use? Thank you very much!

atanasko commented 1 year ago

hi @MingRuiye,

Debian GNU/Linux bookworm (from testing repository)
Python 3.10.11
WOD local wheel build from source

As I previously mention I have just install waymo-open-dataset-tf-2-11-0==1.5.1 in other test project without problem

Thermaloo commented 1 year ago

hi @MingRuiye,

Debian GNU/Linux bookworm (from testing repository)
Python 3.10.11
WOD local wheel build from source

As I previously mention I have just install waymo-open-dataset-tf-2-11-0==1.5.1 in other test project without problem

I know, thank you!