nutonomy / nuscenes-devkit

The devkit of the nuScenes dataset.
https://www.nuScenes.org
Other
2.2k stars 617 forks source link

Questions about splitted validation and test set #949

Closed pengzhi1998 closed 1 year ago

pengzhi1998 commented 1 year ago

Hi, thank you so much for providing this great dataset and development kit! However, I have a few questions about making use of the val/test data:

  1. When using create_splits_scenes() to split the data into train, val, and test, the returned list only contains the scenes' names (like 'scene-0125'). However, as I'm using nusc = NuScenes(version, dataroot, verbose=False) to load the dataset, the returned instance nusc contains all the data attributes while the scene attribute also contains all 1000 scenes with each represented as a dictionary. In this case, how could I get the loaded data all from validation or test set in an easy way? Otherwise, is there an available dataset with a version of v1.0-test or v1.0-val which have the same format as v1.0-trainval?
  2. I want to double-check whether the split test set also has ground-truth values?
whyekit-motional commented 1 year ago

@pengzhi1998 hope these help you:

  1. If you are trying to get all the samples which belong to a certain split, you can try this:

    from nuscenes.nuscenes import NuScenes
    from nuscenes.utils.splits import create_splits_scenes
    
    nusc = NuScenes(version='v1.0-mini', dataroot='/data/sets/nuscenes', verbose=False)
    
    splits = create_splits_scenes()
    print(f'These are the splits in nuScenes: {list(splits.keys())}')
    
    split = 'mini_train'
    scenes_in_split = splits[split]
    
    samples_in_split = []
    for sample in nusc.sample:
       scene_token = sample['scene_token']
       scene = nusc.get('scene', scene_token)
       scene_name = scene['name']
    
       if scene_name in scenes_in_split:
           samples_in_split.append(sample)
    
    print(f'There are {len(samples_in_split)} samples in {split}.')
  2. No, the annotations of the test split is not made public
pengzhi1998 commented 1 year ago

Thank you so much for your quick reply! For the second point, do you mean the split test set from v1.0-trainval doesn't have labels? Or do you mean there is another specific v1.0-test dataset that has no label? If there are no labels for test split in v1.0-trainval, do we generally use the validation split for testing?

whyekit-motional commented 1 year ago

@pengzhi1998 the test split is contained within v1.0-test, and not v1.0-trainval. The v1.0-test that is released to the public does not contain any annotations. Users who want to evaluate their models / methods on the test set (i.e. v.10-test) would need to make a submission to the corresponding evaluation server (nuScenes detection challenge, nuScenes lidar segmentation challenge, etc.)

pengzhi1998 commented 1 year ago

Thank you! But what about this file: https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/utils/splits.py. It splits the v1.0-trainval to three folds, train, val, and test. What does this test split do? May I also use this for testing?

Thank you so much for your great help. Look forward to your reply!

whyekit-motional commented 1 year ago

@pengzhi1998 yes, the test split is for testing (but, like I mentioned above, the annotations in the test split are not released to the public)

Most users would use the val split for testing locally, and then make a submission to the desired evaluation server to test on the test split

pengzhi1998 commented 1 year ago

Got it. Thanks a lot!!