Closed Plumess closed 7 months ago
@Plumess I'm assuming you are talking about the detection task - if you are able to specify the keyframes of the 100 scenes you want to evaluate on in a {nusc.dataroot}/{nusc.version}/splits.json
, you could pass that into DetectionEval
: https://github.com/nutonomy/nuscenes-devkit/blob/4df2701feb3436ae49edaf70128488865a3f6ff9/python-sdk/nuscenes/eval/detection/evaluate.py#L93-L101
Thank you very much for your response. I apologize if my previous message was unclear or confusing. I am currently attempting to run UniAD (https://github.com/OpenDriveLab/UniAD, CVPR2023), which utilizes the NuScenes dataset. I am still exploring how it employs the NuScenes data. My intention was to simply emulate its use of the NuScenes dataset structure, but instead of using the entire dataset, I wanted to use a subset to minimize code modifications and try running the training. Here is the directory structure I aimed to replicate with a smaller, self-contained subset:
nuscenes/
│ ├── can_bus/
│ ├── maps/
│ ├── samples/
│ ├── sweeps/
│ ├── v1.0-test/
│ ├── v1.0-trainval/
I'd like to be able to mimic v1.0-test or v1.0-trainval with a subset of teaser or similar when I can to minimise changes to the original UniAD code. If you have any recommendations on how to proceed or if there's a better approach to achieving this, I would greatly appreciate your guidance.
@Plumess I'm not 100% familiar with the code for UniAD, but I suppose there must be a part of the code which tells the data-loader which sample tokens to load during training / eval
You could possibly modify that part of the code to consume only the sample tokens which you are interested in
Thanks for the reply, I'll give it a try.
Hello,I also want to use a mini datasets to eval the UniAD but the mini-version on the official website cannot meet my requirements. I want to choose some scenes or .jpg files that interest me from the full datasets. Could you tell me how did you solve the problem?
Hi, @fvgrvr
If you wish to split the data randomly, you need to use the list obtained from the nuscenes
toolkit in UniAD project data_converter/uniad_nuscenes_converter.py
for random selection and replace the output accordingly.
from nuscenes.utils import splits
available_vers = ['v1.0-trainval', 'v1.0-test', 'v1.0-mini']
assert version in available_vers
if version == 'v1.0-trainval':
train_scenes = splits.train
val_scenes = splits.val
# random split 1/10
num_train_scenes = len(train_scenes)
train_scenes = random.sample(train_scenes, num_train_scenes // 10)
num_val_scenes = len(val_scenes)
val_scenes = random.sample(val_scenes, num_val_scenes // 10)
elif version == 'v1.0-test':
train_scenes = splits.test
# random split 1/10
num_train_scenes = len(train_scenes)
train_scenes = random.sample(train_scenes, num_train_scenes // 10)
val_scenes = []
elif version == 'v1.0-mini':
train_scenes = splits.mini_train
val_scenes = splits.mini_val
else:
raise ValueError('unknown')
Alternatively, you can use the get_scenes_of_custom_split
function from nuscenes.utils.splits
to read a custom splits.json
.
The above experience is from my notes from a few months ago, so please excuse any discrepancies.
Hello,
I am an individual user challenged by the logistics of downloading and storing the entire NuScenes dataset, which demands significant storage space and network bandwidth.
The mini-dataset, containing only 10 scenes, does not meet my needs for effectively evaluating model training. I am exploring the feasibility of using the teaser dataset, modifying it to create splits similar to v1.0-trainval and v1.0-test, with about 100 scenes.My goal is to train models on this subset to evaluate what performance can be achieved with my current hardware and to determine the expected training durations on low-end graphics cards.
I have already made some headway by using the python-sdk/nuscenes/utils/splits.py to adjust the teaser set. I modified the map.json format from individual log_tokens to log_tokens and successfully obtained the split results. My objective is to replicate the data division of the teaser dataset into v0.1-trainval and v0.1-test. Is this a practical approach, or might there be a better strategy? Additionally, is it possible to achieve similar outcomes by downloading only the keyframes from the full dataset?
If the teaser is indeed deprecated, could you provide some guidance on how to create useful subsets from the full dataset?
Thank you for any advice or suggestions you can provide.