[nuScnenes] nuScenes generation extremely slow

BiboyQG commented 3 months ago

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmdetection3d/tree/dev-1.x

Environment

Python 3.8 torch==2.0.1

Reproduces the problem - code sample

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

Reproduces the problem - command or script

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

Reproduces the problem - error message

Create GT Database of NuScenesDataset 03/26 23:14:11 - mmengine - INFO - ------------------------------ 03/26 23:14:11 - mmengine - INFO - The length of training dataset: 28130 03/26 23:14:11 - mmengine - INFO - The number of instances per category in the dataset: +----------------------+--------+ | category | number | +----------------------+--------+ | car | 413318 | | truck | 72815 | | trailer | 20701 | | bus | 13163 | | construction_vehicle | 11993 | | bicycle | 9478 | | motorcycle | 10109 | | pedestrian | 185847 | | traffic_cone | 82362 | | barrier | 125095 | +----------------------+--------+ [> ] 1016/28130, 0.4 task/s, elapsed: 2775s, ETA: 74043s

As you can see, when it comes to "Create GT Database of NuScenesDataset", the speed is extremely slow... (eta: about 22 hours) Does anyone come across this problem before?

Additional information

Perhaps because of the large size of the data?

TheCodez commented 3 months ago

I was able to speed it up by changing the code here: https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/tools/create_data.py#L88

    # create_groundtruth_database(dataset_name, root_path, info_prefix,
    #                            f'{info_prefix}_infos_train.pkl')

    GTDatabaseCreater(
        dataset_name,
        root_path,
        info_prefix,
        f'{info_prefix}_infos_train.pkl',
        relative_path=False,
        with_mask=False,
        num_worker=4).create()

Currently, this is only used by Waymo, but it worked for nuScenes too.

BiboyQG commented 3 months ago

I was able to speed it up by changing the code here:

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/tools/create_data.py#L88
    # create_groundtruth_database(dataset_name, root_path, info_prefix,
    #                            f'{info_prefix}_infos_train.pkl')

    GTDatabaseCreater(
        dataset_name,
        root_path,
        info_prefix,
        f'{info_prefix}_infos_train.pkl',
        relative_path=False,
        with_mask=False,
        num_worker=4).create()
Currently, this is only used by Waymo, but it worked for nuScenes too.

Thank you so much for the help! I'll try this snippet of code once I complete my other stuff!

BiboyQG commented 3 months ago

I was able to speed it up by changing the code here:

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/tools/create_data.py#L88
    # create_groundtruth_database(dataset_name, root_path, info_prefix,
    #                            f'{info_prefix}_infos_train.pkl')

    GTDatabaseCreater(
        dataset_name,
        root_path,
        info_prefix,
        f'{info_prefix}_infos_train.pkl',
        relative_path=False,
        with_mask=False,
        num_worker=4).create()
Currently, this is only used by Waymo, but it worked for nuScenes too.

This method is really effective, making the overall transformation of the dataset completed within 3.5 hours. Thanks for the help!

BiboyQG commented 3 months ago

Closing this now.

open-mmlab / mmdetection3d