open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5k stars 1.49k forks source link

[Bug] Unable to save prediction results when running test.py #2962

Open abadithela opened 2 months ago

abadithela commented 2 months ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1: NVIDIA RTX A6000 CUDA_HOME: /home/apurvabadithela/miniconda3/envs/detection NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0 PyTorch: 1.13.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.14.1 OpenCV: 4.7.0 MMEngine: 0.9.1 MMDetection: 3.2.0 MMDetection3D: 1.4.0+fe25f7a spconv2.0: True

Reproduces the problem - code sample

I want to save prediction results from running the project mmdet3d/projects/BEVFusion. The documentation states to add the tag pklfile_prefix to the test_evaluator, which I do in the config file: config_file by adding the following line after: test_evaluator.update({'pklfile_prefix':'/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl'})

Reproduces the problem - command or script

Then, I run the following from command line:

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py     checkpoints/bevfusion_converted.pth  --task 'multi-modality_det'

Reproduces the problem - error message

And I get the following error message.

04/22 10:32:27 - mmengine - INFO - ------------------------------
04/22 10:32:27 - mmengine - INFO - The length of test dataset: 6019
04/22 10:32:27 - mmengine - INFO - The number of instances per category in the dataset:
+----------------------+--------+
| category             | number |
+----------------------+--------+
| car                  | 80004  |
| truck                | 15704  |
| construction_vehicle | 2678   |
| bus                  | 3158   |
| trailer              | 4159   |
| barrier              | 26992  |
| motorcycle           | 2508   |
| bicycle              | 2381   |
| pedestrian           | 34347  |
| traffic_cone         | 15597  |
+----------------------+--------+
Traceback (most recent call last):
  File "tools/test.py", line 149, in <module>
    main()
  File "tools/test.py", line 145, in main
    runner.test()
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1816, in test
    self._test_loop = self.build_test_loop(self._test_loop)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1611, in build_test_loop
    loop = TestLoop(
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/loops.py", line 413, in __init__
    self.evaluator = runner.build_evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1318, in build_evaluator
    return Evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 25, in __init__
    self.metrics.append(METRICS.build(metric))
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'pklfile_prefix'

Additional information

  1. I would like to store the prediction results (of the validation set) not in a \tmp folder. What is the best way to do this?
  2. I have tried setting --cfg-options, but the syntax was not clear and it kept erroring out.
abadithela commented 2 months ago

Running the following command from command line also doesn't give the right answer:

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py  checkpoints/bevfusion_converted.pth  --cfg-options "test_evaluator.pklfile_prefix=/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl" --task 'multi-modality_det'

Instead, I get the following error:

04/22 12:07:56 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
    CUDA available: True
    numpy_random_seed: 2089837194
    GPU 0,1: NVIDIA RTX A6000
    CUDA_HOME: /home/apurvabadithela/miniconda3/envs/detection
    NVCC: Cuda compilation tools, release 11.7, V11.7.99
    GCC: gcc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0
    PyTorch: 1.13.1
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.14.1
    OpenCV: 4.7.0
    MMEngine: 0.9.1

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 2089837194
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

04/22 12:07:56 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=32, enable=False)
backend_args = None
class_names = [
    'car',
    'truck',
    'construction_vehicle',
    'bus',
    'trailer',
    'barrier',
    'motorcycle',
    'bicycle',
    'pedestrian',
    'traffic_cone',
]
custom_imports = dict(
    allow_failed_imports=False, imports=[
        'projects.BEVFusion.bevfusion',
    ])
data_prefix = dict(
    CAM_BACK='samples/CAM_BACK',
    CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
    CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
    CAM_FRONT='samples/CAM_FRONT',
    CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
    CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
    pts='samples/LIDAR_TOP',
    sweeps='sweeps/LIDAR_TOP')
data_root = 'data/nuscenes/'
dataset_type = 'NuScenesDataset'
db_sampler = dict(
    classes=[
        'car',
        'truck',
        'construction_vehicle',
        'bus',
        'trailer',
        'barrier',
        'motorcycle',
        'bicycle',
        'pedestrian',
        'traffic_cone',
    ],
    data_root='data/nuscenes/',
    info_path='data/nuscenes/nuscenes_dbinfos_train.pkl',
    points_loader=dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=[
            0,
            1,
            2,
            3,
            4,
        ]),
    prepare=dict(
        filter_by_difficulty=[
            -1,
        ],
        filter_by_min_points=dict(
            barrier=5,
            bicycle=5,
            bus=5,
            car=5,
            construction_vehicle=5,
            motorcycle=5,
            pedestrian=5,
            traffic_cone=5,
            trailer=5,
            truck=5)),
    rate=1.0,
    sample_groups=dict(
        barrier=2,
        bicycle=6,
        bus=4,
        car=2,
        construction_vehicle=7,
        motorcycle=6,
        pedestrian=2,
        traffic_cone=2,
        trailer=6,
        truck=3))
default_hooks = dict(
    checkpoint=dict(interval=1, type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(type='Det3DVisualizationHook'))
default_scope = 'mmdet3d'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
input_modality = dict(use_camera=True, use_lidar=True)
launcher = 'none'
load_from = 'checkpoints/bevfusion_converted.pth'
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50)
lr = 0.0001
metainfo = dict(classes=[
    'car',
    'truck',
    'construction_vehicle',
    'bus',
    'trailer',
    'barrier',
    'motorcycle',
    'bicycle',
    'pedestrian',
    'traffic_cone',
])
model = dict(
    bbox_head=dict(
        auxiliary=True,
        bbox_coder=dict(
            code_size=10,
            out_size_factor=8,
            pc_range=[
                -54.0,
                -54.0,
            ],
            post_center_range=[
                -61.2,
                -61.2,
                -10.0,
                61.2,
                61.2,
                10.0,
            ],
            score_threshold=0.0,
            type='TransFusionBBoxCoder',
            voxel_size=[
                0.075,
                0.075,
            ]),
        bn_momentum=0.1,
        common_heads=dict(
            center=[
                2,
                2,
            ],
            dim=[
                3,
                2,
            ],
            height=[
                1,
                2,
            ],
            rot=[
                2,
                2,
            ],
            vel=[
                2,
                2,
            ]),
        decoder_layer=dict(
            cross_attn_cfg=dict(dropout=0.1, embed_dims=128, num_heads=8),
            ffn_cfg=dict(
                act_cfg=dict(inplace=True, type='ReLU'),
                embed_dims=128,
                feedforward_channels=256,
                ffn_drop=0.1,
                num_fcs=2),
            norm_cfg=dict(type='LN'),
            pos_encoding_cfg=dict(input_channel=2, num_pos_feats=128),
            self_attn_cfg=dict(dropout=0.1, embed_dims=128, num_heads=8),
            type='TransformerDecoderLayer'),
        hidden_channel=128,
        in_channels=512,
        loss_bbox=dict(
            loss_weight=0.25, reduction='mean', type='mmdet.L1Loss'),
        loss_cls=dict(
            alpha=0.25,
            gamma=2.0,
            loss_weight=1.0,
            reduction='mean',
            type='mmdet.FocalLoss',
            use_sigmoid=True),
        loss_heatmap=dict(
            loss_weight=1.0, reduction='mean', type='mmdet.GaussianFocalLoss'),
        nms_kernel_size=3,
        num_classes=10,
        num_decoder_layers=1,
        num_proposals=200,
        test_cfg=dict(
            dataset='nuScenes',
            grid_size=[
                1440,
                1440,
                41,
            ],
            nms_type=None,
            out_size_factor=8,
            pc_range=[
                -54.0,
                -54.0,
            ],
            voxel_size=[
                0.075,
                0.075,
            ]),
        train_cfg=dict(
            assigner=dict(
                cls_cost=dict(
                    alpha=0.25,
                    gamma=2.0,
                    type='mmdet.FocalLossCost',
                    weight=0.15),
                iou_calculator=dict(coordinate='lidar', type='BboxOverlaps3D'),
                iou_cost=dict(type='IoU3DCost', weight=0.25),
                reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25),
                type='HungarianAssigner3D'),
            code_weights=[
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                0.2,
                0.2,
            ],
            dataset='nuScenes',
            gaussian_overlap=0.1,
            grid_size=[
                1440,
                1440,
                41,
            ],
            min_radius=2,
            out_size_factor=8,
            point_cloud_range=[
                -54.0,
                -54.0,
                -5.0,
                54.0,
                54.0,
                3.0,
            ],
            pos_weight=-1,
            voxel_size=[
                0.075,
                0.075,
                0.2,
            ]),
        type='TransFusionHead'),
    data_preprocessor=dict(
        bgr_to_rgb=False,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        pad_size_divisor=32,
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='Det3DDataPreprocessor',
        voxelize_cfg=dict(
            max_num_points=10,
            max_voxels=[
                120000,
                160000,
            ],
            point_cloud_range=[
                -54.0,
                -54.0,
                -5.0,
                54.0,
                54.0,
                3.0,
            ],
            voxel_size=[
                0.075,
                0.075,
                0.2,
            ],
            voxelize_reduce=True)),
    fusion_layer=dict(
        in_channels=[
            80,
            256,
        ], out_channels=256, type='ConvFuser'),
    img_backbone=dict(
        attn_drop_rate=0.0,
        convert_weights=True,
        depths=[
            2,
            2,
            6,
            2,
        ],
        drop_path_rate=0.2,
        drop_rate=0.0,
        embed_dims=96,
        init_cfg=dict(
            checkpoint=
            'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth',
            type='Pretrained'),
        mlp_ratio=4,
        num_heads=[
            3,
            6,
            12,
            24,
        ],
        out_indices=[
            1,
            2,
            3,
        ],
        patch_norm=True,
        qk_scale=None,
        qkv_bias=True,
        type='mmdet.SwinTransformer',
        window_size=7,
        with_cp=False),
    img_neck=dict(
        act_cfg=dict(inplace=True, type='ReLU'),
        in_channels=[
            192,
            384,
            768,
        ],
        norm_cfg=dict(requires_grad=True, type='BN2d'),
        num_outs=3,
        out_channels=256,
        start_level=0,
        type='GeneralizedLSSFPN',
        upsample_cfg=dict(align_corners=False, mode='bilinear')),
    pts_backbone=dict(
        conv_cfg=dict(bias=False, type='Conv2d'),
        in_channels=256,
        layer_nums=[
            5,
            5,
        ],
        layer_strides=[
            1,
            2,
        ],
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN'),
        out_channels=[
            128,
            256,
        ],
        type='SECOND'),
    pts_middle_encoder=dict(
        block_type='basicblock',
        encoder_channels=(
            (
                16,
                16,
                32,
            ),
            (
                32,
                32,
                64,
            ),
            (
                64,
                64,
                128,
            ),
            (
                128,
                128,
            ),
        ),
        encoder_paddings=(
            (
                0,
                0,
                1,
            ),
            (
                0,
                0,
                1,
            ),
            (
                0,
                0,
                (
                    1,
                    1,
                    0,
                ),
            ),
            (
                0,
                0,
            ),
        ),
        in_channels=5,
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN1d'),
        order=(
            'conv',
            'norm',
            'act',
        ),
        sparse_shape=[
            1440,
            1440,
            41,
        ],
        type='BEVFusionSparseEncoder'),
    pts_neck=dict(
        in_channels=[
            128,
            256,
        ],
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN'),
        out_channels=[
            256,
            256,
        ],
        type='SECONDFPN',
        upsample_cfg=dict(bias=False, type='deconv'),
        upsample_strides=[
            1,
            2,
        ],
        use_conv_for_no_stride=True),
    pts_voxel_encoder=dict(num_features=5, type='HardSimpleVFE'),
    type='BEVFusion',
    view_transform=dict(
        dbound=[
            1.0,
            60.0,
            0.5,
        ],
        downsample=2,
        feature_size=[
            32,
            88,
        ],
        image_size=[
            256,
            704,
        ],
        in_channels=256,
        out_channels=80,
        type='DepthLSSTransform',
        xbound=[
            -54.0,
            54.0,
            0.3,
        ],
        ybound=[
            -54.0,
            54.0,
            0.3,
        ],
        zbound=[
            -10.0,
            10.0,
            20.0,
        ]))
optim_wrapper = dict(
    clip_grad=dict(max_norm=35, norm_type=2),
    optimizer=dict(lr=0.0002, type='AdamW', weight_decay=0.01),
    type='OptimWrapper')
param_scheduler = [
    dict(
        begin=0,
        by_epoch=False,
        end=500,
        start_factor=0.33333333,
        type='LinearLR'),
    dict(
        T_max=6,
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=6,
        eta_min_ratio=0.0001,
        type='CosineAnnealingLR'),
    dict(
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=2.4,
        eta_min=0.8947368421052632,
        type='CosineAnnealingMomentum'),
    dict(
        begin=2.4,
        by_epoch=True,
        convert_to_iter_based=True,
        end=6,
        eta_min=1,
        type='CosineAnnealingMomentum'),
]
point_cloud_range = [
    -54.0,
    -54.0,
    -5.0,
    54.0,
    54.0,
    3.0,
]
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='nuscenes_infos_val.pkl',
        backend_args=None,
        box_type_3d='LiDAR',
        data_prefix=dict(
            CAM_BACK='samples/CAM_BACK',
            CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
            CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
            CAM_FRONT='samples/CAM_FRONT',
            CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
            CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
            pts='samples/LIDAR_TOP',
            sweeps='sweeps/LIDAR_TOP'),
        data_root='data/nuscenes/',
        metainfo=dict(classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ]),
        modality=dict(use_camera=True, use_lidar=True),
        pipeline=[
            dict(
                backend_args=None,
                color_type='color',
                to_float32=True,
                type='BEVLoadMultiViewImageFromFiles'),
            dict(
                backend_args=None,
                coord_type='LIDAR',
                load_dim=5,
                type='LoadPointsFromFile',
                use_dim=5),
            dict(
                backend_args=None,
                load_dim=5,
                pad_empty_sweeps=True,
                remove_close=True,
                sweeps_num=9,
                type='LoadPointsFromMultiSweeps',
                use_dim=5),
            dict(
                bot_pct_lim=[
                    0.0,
                    0.0,
                ],
                final_dim=[
                    256,
                    704,
                ],
                is_train=False,
                rand_flip=False,
                resize_lim=[
                    0.48,
                    0.48,
                ],
                rot_lim=[
                    0.0,
                    0.0,
                ],
                type='ImageAug3D'),
            dict(
                point_cloud_range=[
                    -54.0,
                    -54.0,
                    -5.0,
                    54.0,
                    54.0,
                    3.0,
                ],
                type='PointsRangeFilter'),
            dict(
                keys=[
                    'img',
                    'points',
                    'gt_bboxes_3d',
                    'gt_labels_3d',
                ],
                meta_keys=[
                    'cam2img',
                    'ori_cam2img',
                    'lidar2cam',
                    'lidar2img',
                    'cam2lidar',
                    'ori_lidar2img',
                    'img_aug_matrix',
                    'box_type_3d',
                    'sample_idx',
                    'lidar_path',
                    'img_path',
                    'num_pts_feats',
                ],
                type='Pack3DDetInputs'),
        ],
        test_mode=True,
        type='NuScenesDataset'),
    drop_last=False,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    ann_file='data/nuscenes/nuscenes_infos_val.pkl',
    backend_args=None,
    data_root='data/nuscenes/',
    metric='bbox',
    pklfile_prefix=
    '/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl',
    type='NuScenesMetric')
test_pipeline = [
    dict(
        backend_args=None,
        color_type='color',
        to_float32=True,
        type='BEVLoadMultiViewImageFromFiles'),
    dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=5),
    dict(
        backend_args=None,
        load_dim=5,
        pad_empty_sweeps=True,
        remove_close=True,
        sweeps_num=9,
        type='LoadPointsFromMultiSweeps',
        use_dim=5),
    dict(
        bot_pct_lim=[
            0.0,
            0.0,
        ],
        final_dim=[
            256,
            704,
        ],
        is_train=False,
        rand_flip=False,
        resize_lim=[
            0.48,
            0.48,
        ],
        rot_lim=[
            0.0,
            0.0,
        ],
        type='ImageAug3D'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='PointsRangeFilter'),
    dict(
        keys=[
            'img',
            'points',
            'gt_bboxes_3d',
            'gt_labels_3d',
        ],
        meta_keys=[
            'cam2img',
            'ori_cam2img',
            'lidar2cam',
            'lidar2img',
            'cam2lidar',
            'ori_lidar2img',
            'img_aug_matrix',
            'box_type_3d',
            'sample_idx',
            'lidar_path',
            'img_path',
            'num_pts_feats',
        ],
        type='Pack3DDetInputs'),
]
train_cfg = dict(by_epoch=True, max_epochs=6, val_interval=1)
train_dataloader = dict(
    batch_size=4,
    dataset=dict(
        dataset=dict(
            ann_file='nuscenes_infos_train.pkl',
            box_type_3d='LiDAR',
            data_prefix=dict(
                CAM_BACK='samples/CAM_BACK',
                CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
                CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
                CAM_FRONT='samples/CAM_FRONT',
                CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
                CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
                pts='samples/LIDAR_TOP',
                sweeps='sweeps/LIDAR_TOP'),
            data_root='data/nuscenes/',
            metainfo=dict(classes=[
                'car',
                'truck',
                'construction_vehicle',
                'bus',
                'trailer',
                'barrier',
                'motorcycle',
                'bicycle',
                'pedestrian',
                'traffic_cone',
            ]),
            modality=dict(use_camera=True, use_lidar=True),
            pipeline=[
                dict(
                    backend_args=None,
                    color_type='color',
                    to_float32=True,
                    type='BEVLoadMultiViewImageFromFiles'),
                dict(
                    backend_args=None,
                    coord_type='LIDAR',
                    load_dim=5,
                    type='LoadPointsFromFile',
                    use_dim=5),
                dict(
                    backend_args=None,
                    load_dim=5,
                    pad_empty_sweeps=True,
                    remove_close=True,
                    sweeps_num=9,
                    type='LoadPointsFromMultiSweeps',
                    use_dim=5),
                dict(
                    type='LoadAnnotations3D',
                    with_attr_label=False,
                    with_bbox_3d=True,
                    with_label_3d=True),
                dict(
                    bot_pct_lim=[
                        0.0,
                        0.0,
                    ],
                    final_dim=[
                        256,
                        704,
                    ],
                    is_train=True,
                    rand_flip=True,
                    resize_lim=[
                        0.38,
                        0.55,
                    ],
                    rot_lim=[
                        -5.4,
                        5.4,
                    ],
                    type='ImageAug3D'),
                dict(
                    rot_range=[
                        -0.78539816,
                        0.78539816,
                    ],
                    scale_ratio_range=[
                        0.9,
                        1.1,
                    ],
                    translation_std=0.5,
                    type='BEVFusionGlobalRotScaleTrans'),
                dict(type='BEVFusionRandomFlip3D'),
                dict(
                    point_cloud_range=[
                        -54.0,
                        -54.0,
                        -5.0,
                        54.0,
                        54.0,
                        3.0,
                    ],
                    type='PointsRangeFilter'),
                dict(
                    point_cloud_range=[
                        -54.0,
                        -54.0,
                        -5.0,
                        54.0,
                        54.0,
                        3.0,
                    ],
                    type='ObjectRangeFilter'),
                dict(
                    classes=[
                        'car',
                        'truck',
                        'construction_vehicle',
                        'bus',
                        'trailer',
                        'barrier',
                        'motorcycle',
                        'bicycle',
                        'pedestrian',
                        'traffic_cone',
                    ],
                    type='ObjectNameFilter'),
                dict(
                    fixed_prob=True,
                    max_epoch=6,
                    mode=1,
                    offset=False,
                    prob=0.0,
                    ratio=0.5,
                    rotate=1,
                    type='GridMask',
                    use_h=True,
                    use_w=True),
                dict(type='PointShuffle'),
                dict(
                    keys=[
                        'points',
                        'img',
                        'gt_bboxes_3d',
                        'gt_labels_3d',
                        'gt_bboxes',
                        'gt_labels',
                    ],
                    meta_keys=[
                        'cam2img',
                        'ori_cam2img',
                        'lidar2cam',
                        'lidar2img',
                        'cam2lidar',
                        'ori_lidar2img',
                        'img_aug_matrix',
                        'box_type_3d',
                        'sample_idx',
                        'lidar_path',
                        'img_path',
                        'transformation_3d_flow',
                        'pcd_rotation',
                        'pcd_scale_factor',
                        'pcd_trans',
                        'img_aug_matrix',
                        'lidar_aug_matrix',
                        'num_pts_feats',
                    ],
                    type='Pack3DDetInputs'),
            ],
            test_mode=False,
            type='NuScenesDataset',
            use_valid_flag=True),
        type='CBGSDataset'),
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(
        backend_args=None,
        color_type='color',
        to_float32=True,
        type='BEVLoadMultiViewImageFromFiles'),
    dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=5),
    dict(
        backend_args=None,
        load_dim=5,
        pad_empty_sweeps=True,
        remove_close=True,
        sweeps_num=9,
        type='LoadPointsFromMultiSweeps',
        use_dim=5),
    dict(
        type='LoadAnnotations3D',
        with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True),
    dict(
        bot_pct_lim=[
            0.0,
            0.0,
        ],
        final_dim=[
            256,
            704,
        ],
        is_train=True,
        rand_flip=True,
        resize_lim=[
            0.38,
            0.55,
        ],
        rot_lim=[
            -5.4,
            5.4,
        ],
        type='ImageAug3D'),
    dict(
        rot_range=[
            -0.78539816,
            0.78539816,
        ],
        scale_ratio_range=[
            0.9,
            1.1,
        ],
        translation_std=0.5,
        type='BEVFusionGlobalRotScaleTrans'),
    dict(type='BEVFusionRandomFlip3D'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='PointsRangeFilter'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='ObjectRangeFilter'),
    dict(
        classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ],
        type='ObjectNameFilter'),
    dict(
        fixed_prob=True,
        max_epoch=6,
        mode=1,
        offset=False,
        prob=0.0,
        ratio=0.5,
        rotate=1,
        type='GridMask',
        use_h=True,
        use_w=True),
    dict(type='PointShuffle'),
    dict(
        keys=[
            'points',
            'img',
            'gt_bboxes_3d',
            'gt_labels_3d',
            'gt_bboxes',
            'gt_labels',
        ],
        meta_keys=[
            'cam2img',
            'ori_cam2img',
            'lidar2cam',
            'lidar2img',
            'cam2lidar',
            'ori_lidar2img',
            'img_aug_matrix',
            'box_type_3d',
            'sample_idx',
            'lidar_path',
            'img_path',
            'transformation_3d_flow',
            'pcd_rotation',
            'pcd_scale_factor',
            'pcd_trans',
            'img_aug_matrix',
            'lidar_aug_matrix',
            'num_pts_feats',
        ],
        type='Pack3DDetInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='nuscenes_infos_val.pkl',
        backend_args=None,
        box_type_3d='LiDAR',
        data_prefix=dict(
            CAM_BACK='samples/CAM_BACK',
            CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
            CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
            CAM_FRONT='samples/CAM_FRONT',
            CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
            CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
            pts='samples/LIDAR_TOP',
            sweeps='sweeps/LIDAR_TOP'),
        data_root='data/nuscenes/',
        metainfo=dict(classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ]),
        modality=dict(use_camera=True, use_lidar=True),
        pipeline=[
            dict(
                backend_args=None,
                color_type='color',
                to_float32=True,
                type='BEVLoadMultiViewImageFromFiles'),
            dict(
                backend_args=None,
                coord_type='LIDAR',
                load_dim=5,
                type='LoadPointsFromFile',
                use_dim=5),
            dict(
                backend_args=None,
                load_dim=5,
                pad_empty_sweeps=True,
                remove_close=True,
                sweeps_num=9,
                type='LoadPointsFromMultiSweeps',
                use_dim=5),
            dict(
                bot_pct_lim=[
                    0.0,
                    0.0,
                ],
                final_dim=[
                    256,
                    704,
                ],
                is_train=False,
                rand_flip=False,
                resize_lim=[
                    0.48,
                    0.48,
                ],
                rot_lim=[
                    0.0,
                    0.0,
                ],
                type='ImageAug3D'),
            dict(
                point_cloud_range=[
                    -54.0,
                    -54.0,
                    -5.0,
                    54.0,
                    54.0,
                    3.0,
                ],
                type='PointsRangeFilter'),
            dict(
                keys=[
                    'img',
                    'points',
                    'gt_bboxes_3d',
                    'gt_labels_3d',
                ],
                meta_keys=[
                    'cam2img',
                    'ori_cam2img',
                    'lidar2cam',
                    'lidar2img',
                    'cam2lidar',
                    'ori_lidar2img',
                    'img_aug_matrix',
                    'box_type_3d',
                    'sample_idx',
                    'lidar_path',
                    'img_path',
                    'num_pts_feats',
                ],
                type='Pack3DDetInputs'),
        ],
        test_mode=True,
        type='NuScenesDataset'),
    drop_last=False,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    ann_file='data/nuscenes/nuscenes_infos_val.pkl',
    backend_args=None,
    data_root='data/nuscenes/',
    metric='bbox',
    type='NuScenesMetric')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='Det3DLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
voxel_size = [
    0.075,
    0.075,
    0.2,
]
work_dir = './work_dirs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d'

04/22 12:07:58 - mmengine - INFO - Loads checkpoint by http backend from path: https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
04/22 12:08:02 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
04/22 12:08:02 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
04/22 12:08:10 - mmengine - INFO - ------------------------------
04/22 12:08:10 - mmengine - INFO - The length of test dataset: 6019
04/22 12:08:10 - mmengine - INFO - The number of instances per category in the dataset:
+----------------------+--------+
| category             | number |
+----------------------+--------+
| car                  | 80004  |
| truck                | 15704  |
| construction_vehicle | 2678   |
| bus                  | 3158   |
| trailer              | 4159   |
| barrier              | 26992  |
| motorcycle           | 2508   |
| bicycle              | 2381   |
| pedestrian           | 34347  |
| traffic_cone         | 15597  |
+----------------------+--------+
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmdet/models/task_modules/builder.py:17: UserWarning: ``build_sampler`` would be deprecated soon, please use ``mmdet.registry.TASK_UTILS.build()`` 
  warnings.warn('``build_sampler`` would be deprecated soon, please use '
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmdet/models/task_modules/builder.py:39: UserWarning: ``build_assigner`` would be deprecated soon, please use ``mmdet.registry.TASK_UTILS.build()`` 
  warnings.warn('``build_assigner`` would be deprecated soon, please use '
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541702/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "tools/test.py", line 149, in <module>
    main()
  File "tools/test.py", line 145, in main
    runner.test()
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1816, in test
    self._test_loop = self.build_test_loop(self._test_loop)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1611, in build_test_loop
    loop = TestLoop(
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/loops.py", line 413, in __init__
    self.evaluator = runner.build_evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1318, in build_evaluator
    return Evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 25, in __init__
    self.metrics.append(METRICS.build(metric))
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'pklfile_prefix'
VeeranjaneyuluToka commented 2 months ago

I am wondering what would happen if you just try without .pkl file with below command, does it saves results in some format?

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py checkpoints/bevfusion_converted.pth --task 'multi-modality_det'

abadithela commented 2 months ago

@VeeranjaneyuluToka Yes, I've tried that, but it does not save individual prediction boxes --- it just creates a .json file with the standard metrics and a data/ folder with visualizations of the bounding boxes. I need the predicted boxes for my analysis.

VeeranajenyuluT commented 1 month ago

I have created my own inference runner based on their demo samples (https://github.com/open-mmlab/mmdetection3d/tree/main/demo), there is a way to visualize and dump the predictions, however i am working on LiDAR based 3D detection only. But it should work even in multi-modality case also i believe, so i would recommend to look into it.

abadithela commented 1 month ago

Hi @VeeranjaneyuluToka: we did the same for just Lidar 3D detector. But based on the demos, doing this for multi-modality did not work. If you look at the multi-modality demo, it requires each point cloud and all associated images for that sample to be in one folder. I'm not sure how to scale this up and run inference for the entire dataset, especially with BEVFusion.