How to create and split NuScenes subsets into trainval and test like v1.0

Plumess commented 7 months ago

Hello,

I am an individual user challenged by the logistics of downloading and storing the entire NuScenes dataset, which demands significant storage space and network bandwidth.

The mini-dataset, containing only 10 scenes, does not meet my needs for effectively evaluating model training. I am exploring the feasibility of using the teaser dataset, modifying it to create splits similar to v1.0-trainval and v1.0-test, with about 100 scenes.My goal is to train models on this subset to evaluate what performance can be achieved with my current hardware and to determine the expected training durations on low-end graphics cards.

I have already made some headway by using the python-sdk/nuscenes/utils/splits.py to adjust the teaser set. I modified the map.json format from individual log_tokens to log_tokens and successfully obtained the split results. My objective is to replicate the data division of the teaser dataset into v0.1-trainval and v0.1-test. Is this a practical approach, or might there be a better strategy? Additionally, is it possible to achieve similar outcomes by downloading only the keyframes from the full dataset?

If the teaser is indeed deprecated, could you provide some guidance on how to create useful subsets from the full dataset?

Thank you for any advice or suggestions you can provide.

whyekit-motional commented 7 months ago

@Plumess I'm assuming you are talking about the detection task - if you are able to specify the keyframes of the 100 scenes you want to evaluate on in a {nusc.dataroot}/{nusc.version}/splits.json, you could pass that into DetectionEval: https://github.com/nutonomy/nuscenes-devkit/blob/4df2701feb3436ae49edaf70128488865a3f6ff9/python-sdk/nuscenes/eval/detection/evaluate.py#L93-L101

Plumess commented 7 months ago

Thank you very much for your response. I apologize if my previous message was unclear or confusing. I am currently attempting to run UniAD (https://github.com/OpenDriveLab/UniAD, CVPR2023), which utilizes the NuScenes dataset. I am still exploring how it employs the NuScenes data. My intention was to simply emulate its use of the NuScenes dataset structure, but instead of using the entire dataset, I wanted to use a subset to minimize code modifications and try running the training. Here is the directory structure I aimed to replicate with a smaller, self-contained subset:

nuscenes/
│   ├── can_bus/
│   ├── maps/
│   ├── samples/
│   ├── sweeps/
│   ├── v1.0-test/
│   ├── v1.0-trainval/

I'd like to be able to mimic v1.0-test or v1.0-trainval with a subset of teaser or similar when I can to minimise changes to the original UniAD code. If you have any recommendations on how to proceed or if there's a better approach to achieving this, I would greatly appreciate your guidance.

whyekit-motional commented 7 months ago

@Plumess I'm not 100% familiar with the code for UniAD, but I suppose there must be a part of the code which tells the data-loader which sample tokens to load during training / eval

You could possibly modify that part of the code to consume only the sample tokens which you are interested in

Plumess commented 7 months ago

Thanks for the reply, I'll give it a try.

fvgrvr commented 2 months ago

Hello，I also want to use a mini datasets to eval the UniAD but the mini-version on the official website cannot meet my requirements. I want to choose some scenes or .jpg files that interest me from the full datasets. Could you tell me how did you solve the problem?

Plumess commented 2 months ago

Hi, @fvgrvr

If you wish to split the data randomly, you need to use the list obtained from the nuscenes toolkit in UniAD project data_converter/uniad_nuscenes_converter.py for random selection and replace the output accordingly.

    from nuscenes.utils import splits
    available_vers = ['v1.0-trainval', 'v1.0-test', 'v1.0-mini']
    assert version in available_vers
    if version == 'v1.0-trainval':
        train_scenes = splits.train
        val_scenes = splits.val
        # random split 1/10
        num_train_scenes = len(train_scenes)
        train_scenes = random.sample(train_scenes, num_train_scenes // 10)
        num_val_scenes = len(val_scenes)
        val_scenes = random.sample(val_scenes, num_val_scenes // 10)
    elif version == 'v1.0-test':
        train_scenes = splits.test
        # random split 1/10
        num_train_scenes = len(train_scenes)
        train_scenes = random.sample(train_scenes, num_train_scenes // 10)
        val_scenes = []
    elif version == 'v1.0-mini':
        train_scenes = splits.mini_train
        val_scenes = splits.mini_val
    else:
        raise ValueError('unknown')

Alternatively, you can use the get_scenes_of_custom_split function from nuscenes.utils.splits to read a custom splits.json.

The above experience is from my notes from a few months ago, so please excuse any discrepancies.

nutonomy / nuscenes-devkit

How to create and split NuScenes subsets into trainval and test like v1.0 #1074