Small dataset partitions

motional / nuplan-devkit

The devkit of the nuPlan dataset.

https://www.nuplan.org

Other

703 stars 136 forks source link

Small dataset partitions #158

Closed perone closed 1 year ago

perone commented 2 years ago

Hi, thanks for sharing the dataset. Are there any plans to release a smaller version of it ? The nuplan-mini is too small and doesn't have standard train/val/test partitions, and the whole dataset is around ~1TB of data. Having a smaller dataset that is bigger than the mini but with standard train/val/test partitions can really help future research work/benchmarks/etc. Thanks !

patk-motional commented 2 years ago

Hi @perone,

The training split is already split into smaller chunks. You do not need to download the entire dataset.

Is this what you are looking for?

perone commented 2 years ago

Thanks @patk-motional. But if someone uses only a single city for example for training, it will have other unseen cities in the standard val/test splits, which means a change in expected generalization. Ideally it would have to be a resample of the training/val/test and also have a standard partitioning (that can be downloaded from nuPlan, otherwise if someone uses a custom training/val/test, it will be divergence from the standard benchmark partitions and will have to make it available elsewhere too). I think you can attract much more researchers to use nuPlan if you provide a smaller dataset with standard train/val/test split (while keeping the same setup you used for the bigger one).

patk-motional commented 2 years ago

Thanks for your feedback, we will look into this. What would be an ideal dataset size? 10% of the full dataset with the same distribution?

perone commented 2 years ago

Thanks @patk-motional, I think 10% to 20% would be fine, but looking at the following table (from the website):

It is not clear if the hours counted there include or not the val/test splits or if it is only training data.

richie-live commented 4 days ago

@perone

Hi, thanks for sharing the dataset. Are there any plans to release a smaller version of it ? The nuplan-mini is too small and doesn't have standard train/val/test partitions, and the whole dataset is around ~1TB of data. Having a smaller dataset that is bigger than the mini but with standard train/val/test partitions can really help future research work/benchmarks/etc. Thanks !

Hi, have you found a way to adrress this problem?