Folder structure of scenarios in Waymo Open Dataset

waymo-research / waymo-open-dataset

Waymo Open Dataset

Other

2.66k stars 609 forks source link

The scenario folder for the Waymo Motion Dataset looks like this:

For the training and validation sets, the website (https://waymo.com/open/data/motion/) says "These segments are further broken into 9 second windows (1 second of history and 8 seconds of future data) with varying overlap." What is meant by history and future data – and does this distinction matter for training?

Furthermore, what are the testing_interactive and validation_interactive folders? How are they different from the testing and validation folders?

Lastly, I notice there's a training_20s folder. Here, I assume each TFrecord file corresponds to a 20 second segment as opposed to a 9 second segment for the TFRecords in training and validation. So how come training_20s, training, and validation each have 1000 TFRecords? I would expect training and validation to have a little more than (20/9) double the number of TFrecords, no?

Thanks for the help!

Hi, As for history and future data, models are intended to take 1s of history as input and output 8 seconds of future prediction data. As such, the training data is broken into 1 second history, 1 current time step, and 8 seconds of future data.

The interactive dataset splits are for use with the interaction challenge described here.

As for the number of files, each tfrecord file contains many examples (the tfrecord format provides for serial reading of examples from a single file). They are broken into smaller shards for processing in parallel. The training sets consist of 1000 file shards each while the validation and test sets consist of 150 file shards each. Again each of the file shards contains many examples - there are hundreds of thousands of total examples.

Please let me know if you have further questions.

waymo-research / waymo-open-dataset

Folder structure of scenarios in Waymo Open Dataset #720