nutonomy / nuscenes-devkit

The devkit of the nuScenes dataset.
https://www.nuScenes.org
Other
2.18k stars 616 forks source link

How to create and split NuScenes subsets into trainval and test like v1.0 #1074

Closed Plumess closed 2 months ago

Plumess commented 2 months ago

Hello,

I am an individual user challenged by the logistics of downloading and storing the entire NuScenes dataset, which demands significant storage space and network bandwidth.

The mini-dataset, containing only 10 scenes, does not meet my needs for effectively evaluating model training. I am exploring the feasibility of using the teaser dataset, modifying it to create splits similar to v1.0-trainval and v1.0-test, with about 100 scenes.My goal is to train models on this subset to evaluate what performance can be achieved with my current hardware and to determine the expected training durations on low-end graphics cards.

I have already made some headway by using the python-sdk/nuscenes/utils/splits.py to adjust the teaser set. I modified the map.json format from individual log_tokens to log_tokens and successfully obtained the split results. My objective is to replicate the data division of the teaser dataset into v0.1-trainval and v0.1-test. Is this a practical approach, or might there be a better strategy? Additionally, is it possible to achieve similar outcomes by downloading only the keyframes from the full dataset?

If the teaser is indeed deprecated, could you provide some guidance on how to create useful subsets from the full dataset?

Thank you for any advice or suggestions you can provide.

whyekit-motional commented 2 months ago

@Plumess I'm assuming you are talking about the detection task - if you are able to specify the keyframes of the 100 scenes you want to evaluate on in a {nusc.dataroot}/{nusc.version}/splits.json, you could pass that into DetectionEval: https://github.com/nutonomy/nuscenes-devkit/blob/4df2701feb3436ae49edaf70128488865a3f6ff9/python-sdk/nuscenes/eval/detection/evaluate.py#L93-L101

Plumess commented 2 months ago

Thank you very much for your response. I apologize if my previous message was unclear or confusing. I am currently attempting to run UniAD (https://github.com/OpenDriveLab/UniAD, CVPR2023), which utilizes the NuScenes dataset. I am still exploring how it employs the NuScenes data. My intention was to simply emulate its use of the NuScenes dataset structure, but instead of using the entire dataset, I wanted to use a subset to minimize code modifications and try running the training. Here is the directory structure I aimed to replicate with a smaller, self-contained subset:

nuscenes/
│   ├── can_bus/
│   ├── maps/
│   ├── samples/
│   ├── sweeps/
│   ├── v1.0-test/
│   ├── v1.0-trainval/

I'd like to be able to mimic v1.0-test or v1.0-trainval with a subset of teaser or similar when I can to minimise changes to the original UniAD code. If you have any recommendations on how to proceed or if there's a better approach to achieving this, I would greatly appreciate your guidance.

whyekit-motional commented 2 months ago

@Plumess I'm not 100% familiar with the code for UniAD, but I suppose there must be a part of the code which tells the data-loader which sample tokens to load during training / eval

You could possibly modify that part of the code to consume only the sample tokens which you are interested in

Plumess commented 2 months ago

Thanks for the reply, I'll give it a try.