open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.34k stars 1.55k forks source link

[Bug] Unable to save prediction results when running test.py #2962

Open abadithela opened 7 months ago

abadithela commented 7 months ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1: NVIDIA RTX A6000 CUDA_HOME: /home/apurvabadithela/miniconda3/envs/detection NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0 PyTorch: 1.13.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.14.1 OpenCV: 4.7.0 MMEngine: 0.9.1 MMDetection: 3.2.0 MMDetection3D: 1.4.0+fe25f7a spconv2.0: True

Reproduces the problem - code sample

I want to save prediction results from running the project mmdet3d/projects/BEVFusion. The documentation states to add the tag pklfile_prefix to the test_evaluator, which I do in the config file: config_file by adding the following line after: test_evaluator.update({'pklfile_prefix':'/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl'})

Reproduces the problem - command or script

Then, I run the following from command line:

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py     checkpoints/bevfusion_converted.pth  --task 'multi-modality_det'

Reproduces the problem - error message

And I get the following error message.

04/22 10:32:27 - mmengine - INFO - ------------------------------
04/22 10:32:27 - mmengine - INFO - The length of test dataset: 6019
04/22 10:32:27 - mmengine - INFO - The number of instances per category in the dataset:
+----------------------+--------+
| category             | number |
+----------------------+--------+
| car                  | 80004  |
| truck                | 15704  |
| construction_vehicle | 2678   |
| bus                  | 3158   |
| trailer              | 4159   |
| barrier              | 26992  |
| motorcycle           | 2508   |
| bicycle              | 2381   |
| pedestrian           | 34347  |
| traffic_cone         | 15597  |
+----------------------+--------+
Traceback (most recent call last):
  File "tools/test.py", line 149, in <module>
    main()
  File "tools/test.py", line 145, in main
    runner.test()
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1816, in test
    self._test_loop = self.build_test_loop(self._test_loop)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1611, in build_test_loop
    loop = TestLoop(
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/loops.py", line 413, in __init__
    self.evaluator = runner.build_evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1318, in build_evaluator
    return Evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 25, in __init__
    self.metrics.append(METRICS.build(metric))
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'pklfile_prefix'

Additional information

  1. I would like to store the prediction results (of the validation set) not in a \tmp folder. What is the best way to do this?
  2. I have tried setting --cfg-options, but the syntax was not clear and it kept erroring out.
abadithela commented 7 months ago

Running the following command from command line also doesn't give the right answer:

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py  checkpoints/bevfusion_converted.pth  --cfg-options "test_evaluator.pklfile_prefix=/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl" --task 'multi-modality_det'

Instead, I get the following error:

04/22 12:07:56 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
    CUDA available: True
    numpy_random_seed: 2089837194
    GPU 0,1: NVIDIA RTX A6000
    CUDA_HOME: /home/apurvabadithela/miniconda3/envs/detection
    NVCC: Cuda compilation tools, release 11.7, V11.7.99
    GCC: gcc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0
    PyTorch: 1.13.1
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.14.1
    OpenCV: 4.7.0
    MMEngine: 0.9.1

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 2089837194
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

04/22 12:07:56 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=32, enable=False)
backend_args = None
class_names = [
    'car',
    'truck',
    'construction_vehicle',
    'bus',
    'trailer',
    'barrier',
    'motorcycle',
    'bicycle',
    'pedestrian',
    'traffic_cone',
]
custom_imports = dict(
    allow_failed_imports=False, imports=[
        'projects.BEVFusion.bevfusion',
    ])
data_prefix = dict(
    CAM_BACK='samples/CAM_BACK',
    CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
    CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
    CAM_FRONT='samples/CAM_FRONT',
    CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
    CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
    pts='samples/LIDAR_TOP',
    sweeps='sweeps/LIDAR_TOP')
data_root = 'data/nuscenes/'
dataset_type = 'NuScenesDataset'
db_sampler = dict(
    classes=[
        'car',
        'truck',
        'construction_vehicle',
        'bus',
        'trailer',
        'barrier',
        'motorcycle',
        'bicycle',
        'pedestrian',
        'traffic_cone',
    ],
    data_root='data/nuscenes/',
    info_path='data/nuscenes/nuscenes_dbinfos_train.pkl',
    points_loader=dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=[
            0,
            1,
            2,
            3,
            4,
        ]),
    prepare=dict(
        filter_by_difficulty=[
            -1,
        ],
        filter_by_min_points=dict(
            barrier=5,
            bicycle=5,
            bus=5,
            car=5,
            construction_vehicle=5,
            motorcycle=5,
            pedestrian=5,
            traffic_cone=5,
            trailer=5,
            truck=5)),
    rate=1.0,
    sample_groups=dict(
        barrier=2,
        bicycle=6,
        bus=4,
        car=2,
        construction_vehicle=7,
        motorcycle=6,
        pedestrian=2,
        traffic_cone=2,
        trailer=6,
        truck=3))
default_hooks = dict(
    checkpoint=dict(interval=1, type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(type='Det3DVisualizationHook'))
default_scope = 'mmdet3d'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
input_modality = dict(use_camera=True, use_lidar=True)
launcher = 'none'
load_from = 'checkpoints/bevfusion_converted.pth'
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50)
lr = 0.0001
metainfo = dict(classes=[
    'car',
    'truck',
    'construction_vehicle',
    'bus',
    'trailer',
    'barrier',
    'motorcycle',
    'bicycle',
    'pedestrian',
    'traffic_cone',
])
model = dict(
    bbox_head=dict(
        auxiliary=True,
        bbox_coder=dict(
            code_size=10,
            out_size_factor=8,
            pc_range=[
                -54.0,
                -54.0,
            ],
            post_center_range=[
                -61.2,
                -61.2,
                -10.0,
                61.2,
                61.2,
                10.0,
            ],
            score_threshold=0.0,
            type='TransFusionBBoxCoder',
            voxel_size=[
                0.075,
                0.075,
            ]),
        bn_momentum=0.1,
        common_heads=dict(
            center=[
                2,
                2,
            ],
            dim=[
                3,
                2,
            ],
            height=[
                1,
                2,
            ],
            rot=[
                2,
                2,
            ],
            vel=[
                2,
                2,
            ]),
        decoder_layer=dict(
            cross_attn_cfg=dict(dropout=0.1, embed_dims=128, num_heads=8),
            ffn_cfg=dict(
                act_cfg=dict(inplace=True, type='ReLU'),
                embed_dims=128,
                feedforward_channels=256,
                ffn_drop=0.1,
                num_fcs=2),
            norm_cfg=dict(type='LN'),
            pos_encoding_cfg=dict(input_channel=2, num_pos_feats=128),
            self_attn_cfg=dict(dropout=0.1, embed_dims=128, num_heads=8),
            type='TransformerDecoderLayer'),
        hidden_channel=128,
        in_channels=512,
        loss_bbox=dict(
            loss_weight=0.25, reduction='mean', type='mmdet.L1Loss'),
        loss_cls=dict(
            alpha=0.25,
            gamma=2.0,
            loss_weight=1.0,
            reduction='mean',
            type='mmdet.FocalLoss',
            use_sigmoid=True),
        loss_heatmap=dict(
            loss_weight=1.0, reduction='mean', type='mmdet.GaussianFocalLoss'),
        nms_kernel_size=3,
        num_classes=10,
        num_decoder_layers=1,
        num_proposals=200,
        test_cfg=dict(
            dataset='nuScenes',
            grid_size=[
                1440,
                1440,
                41,
            ],
            nms_type=None,
            out_size_factor=8,
            pc_range=[
                -54.0,
                -54.0,
            ],
            voxel_size=[
                0.075,
                0.075,
            ]),
        train_cfg=dict(
            assigner=dict(
                cls_cost=dict(
                    alpha=0.25,
                    gamma=2.0,
                    type='mmdet.FocalLossCost',
                    weight=0.15),
                iou_calculator=dict(coordinate='lidar', type='BboxOverlaps3D'),
                iou_cost=dict(type='IoU3DCost', weight=0.25),
                reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25),
                type='HungarianAssigner3D'),
            code_weights=[
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                1.0,
                0.2,
                0.2,
            ],
            dataset='nuScenes',
            gaussian_overlap=0.1,
            grid_size=[
                1440,
                1440,
                41,
            ],
            min_radius=2,
            out_size_factor=8,
            point_cloud_range=[
                -54.0,
                -54.0,
                -5.0,
                54.0,
                54.0,
                3.0,
            ],
            pos_weight=-1,
            voxel_size=[
                0.075,
                0.075,
                0.2,
            ]),
        type='TransFusionHead'),
    data_preprocessor=dict(
        bgr_to_rgb=False,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        pad_size_divisor=32,
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='Det3DDataPreprocessor',
        voxelize_cfg=dict(
            max_num_points=10,
            max_voxels=[
                120000,
                160000,
            ],
            point_cloud_range=[
                -54.0,
                -54.0,
                -5.0,
                54.0,
                54.0,
                3.0,
            ],
            voxel_size=[
                0.075,
                0.075,
                0.2,
            ],
            voxelize_reduce=True)),
    fusion_layer=dict(
        in_channels=[
            80,
            256,
        ], out_channels=256, type='ConvFuser'),
    img_backbone=dict(
        attn_drop_rate=0.0,
        convert_weights=True,
        depths=[
            2,
            2,
            6,
            2,
        ],
        drop_path_rate=0.2,
        drop_rate=0.0,
        embed_dims=96,
        init_cfg=dict(
            checkpoint=
            'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth',
            type='Pretrained'),
        mlp_ratio=4,
        num_heads=[
            3,
            6,
            12,
            24,
        ],
        out_indices=[
            1,
            2,
            3,
        ],
        patch_norm=True,
        qk_scale=None,
        qkv_bias=True,
        type='mmdet.SwinTransformer',
        window_size=7,
        with_cp=False),
    img_neck=dict(
        act_cfg=dict(inplace=True, type='ReLU'),
        in_channels=[
            192,
            384,
            768,
        ],
        norm_cfg=dict(requires_grad=True, type='BN2d'),
        num_outs=3,
        out_channels=256,
        start_level=0,
        type='GeneralizedLSSFPN',
        upsample_cfg=dict(align_corners=False, mode='bilinear')),
    pts_backbone=dict(
        conv_cfg=dict(bias=False, type='Conv2d'),
        in_channels=256,
        layer_nums=[
            5,
            5,
        ],
        layer_strides=[
            1,
            2,
        ],
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN'),
        out_channels=[
            128,
            256,
        ],
        type='SECOND'),
    pts_middle_encoder=dict(
        block_type='basicblock',
        encoder_channels=(
            (
                16,
                16,
                32,
            ),
            (
                32,
                32,
                64,
            ),
            (
                64,
                64,
                128,
            ),
            (
                128,
                128,
            ),
        ),
        encoder_paddings=(
            (
                0,
                0,
                1,
            ),
            (
                0,
                0,
                1,
            ),
            (
                0,
                0,
                (
                    1,
                    1,
                    0,
                ),
            ),
            (
                0,
                0,
            ),
        ),
        in_channels=5,
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN1d'),
        order=(
            'conv',
            'norm',
            'act',
        ),
        sparse_shape=[
            1440,
            1440,
            41,
        ],
        type='BEVFusionSparseEncoder'),
    pts_neck=dict(
        in_channels=[
            128,
            256,
        ],
        norm_cfg=dict(eps=0.001, momentum=0.01, type='BN'),
        out_channels=[
            256,
            256,
        ],
        type='SECONDFPN',
        upsample_cfg=dict(bias=False, type='deconv'),
        upsample_strides=[
            1,
            2,
        ],
        use_conv_for_no_stride=True),
    pts_voxel_encoder=dict(num_features=5, type='HardSimpleVFE'),
    type='BEVFusion',
    view_transform=dict(
        dbound=[
            1.0,
            60.0,
            0.5,
        ],
        downsample=2,
        feature_size=[
            32,
            88,
        ],
        image_size=[
            256,
            704,
        ],
        in_channels=256,
        out_channels=80,
        type='DepthLSSTransform',
        xbound=[
            -54.0,
            54.0,
            0.3,
        ],
        ybound=[
            -54.0,
            54.0,
            0.3,
        ],
        zbound=[
            -10.0,
            10.0,
            20.0,
        ]))
optim_wrapper = dict(
    clip_grad=dict(max_norm=35, norm_type=2),
    optimizer=dict(lr=0.0002, type='AdamW', weight_decay=0.01),
    type='OptimWrapper')
param_scheduler = [
    dict(
        begin=0,
        by_epoch=False,
        end=500,
        start_factor=0.33333333,
        type='LinearLR'),
    dict(
        T_max=6,
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=6,
        eta_min_ratio=0.0001,
        type='CosineAnnealingLR'),
    dict(
        begin=0,
        by_epoch=True,
        convert_to_iter_based=True,
        end=2.4,
        eta_min=0.8947368421052632,
        type='CosineAnnealingMomentum'),
    dict(
        begin=2.4,
        by_epoch=True,
        convert_to_iter_based=True,
        end=6,
        eta_min=1,
        type='CosineAnnealingMomentum'),
]
point_cloud_range = [
    -54.0,
    -54.0,
    -5.0,
    54.0,
    54.0,
    3.0,
]
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='nuscenes_infos_val.pkl',
        backend_args=None,
        box_type_3d='LiDAR',
        data_prefix=dict(
            CAM_BACK='samples/CAM_BACK',
            CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
            CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
            CAM_FRONT='samples/CAM_FRONT',
            CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
            CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
            pts='samples/LIDAR_TOP',
            sweeps='sweeps/LIDAR_TOP'),
        data_root='data/nuscenes/',
        metainfo=dict(classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ]),
        modality=dict(use_camera=True, use_lidar=True),
        pipeline=[
            dict(
                backend_args=None,
                color_type='color',
                to_float32=True,
                type='BEVLoadMultiViewImageFromFiles'),
            dict(
                backend_args=None,
                coord_type='LIDAR',
                load_dim=5,
                type='LoadPointsFromFile',
                use_dim=5),
            dict(
                backend_args=None,
                load_dim=5,
                pad_empty_sweeps=True,
                remove_close=True,
                sweeps_num=9,
                type='LoadPointsFromMultiSweeps',
                use_dim=5),
            dict(
                bot_pct_lim=[
                    0.0,
                    0.0,
                ],
                final_dim=[
                    256,
                    704,
                ],
                is_train=False,
                rand_flip=False,
                resize_lim=[
                    0.48,
                    0.48,
                ],
                rot_lim=[
                    0.0,
                    0.0,
                ],
                type='ImageAug3D'),
            dict(
                point_cloud_range=[
                    -54.0,
                    -54.0,
                    -5.0,
                    54.0,
                    54.0,
                    3.0,
                ],
                type='PointsRangeFilter'),
            dict(
                keys=[
                    'img',
                    'points',
                    'gt_bboxes_3d',
                    'gt_labels_3d',
                ],
                meta_keys=[
                    'cam2img',
                    'ori_cam2img',
                    'lidar2cam',
                    'lidar2img',
                    'cam2lidar',
                    'ori_lidar2img',
                    'img_aug_matrix',
                    'box_type_3d',
                    'sample_idx',
                    'lidar_path',
                    'img_path',
                    'num_pts_feats',
                ],
                type='Pack3DDetInputs'),
        ],
        test_mode=True,
        type='NuScenesDataset'),
    drop_last=False,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    ann_file='data/nuscenes/nuscenes_infos_val.pkl',
    backend_args=None,
    data_root='data/nuscenes/',
    metric='bbox',
    pklfile_prefix=
    '/home/apurvabadithela/nuscenes_dataset/inference_results/bevfusion_model/results.pkl',
    type='NuScenesMetric')
test_pipeline = [
    dict(
        backend_args=None,
        color_type='color',
        to_float32=True,
        type='BEVLoadMultiViewImageFromFiles'),
    dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=5),
    dict(
        backend_args=None,
        load_dim=5,
        pad_empty_sweeps=True,
        remove_close=True,
        sweeps_num=9,
        type='LoadPointsFromMultiSweeps',
        use_dim=5),
    dict(
        bot_pct_lim=[
            0.0,
            0.0,
        ],
        final_dim=[
            256,
            704,
        ],
        is_train=False,
        rand_flip=False,
        resize_lim=[
            0.48,
            0.48,
        ],
        rot_lim=[
            0.0,
            0.0,
        ],
        type='ImageAug3D'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='PointsRangeFilter'),
    dict(
        keys=[
            'img',
            'points',
            'gt_bboxes_3d',
            'gt_labels_3d',
        ],
        meta_keys=[
            'cam2img',
            'ori_cam2img',
            'lidar2cam',
            'lidar2img',
            'cam2lidar',
            'ori_lidar2img',
            'img_aug_matrix',
            'box_type_3d',
            'sample_idx',
            'lidar_path',
            'img_path',
            'num_pts_feats',
        ],
        type='Pack3DDetInputs'),
]
train_cfg = dict(by_epoch=True, max_epochs=6, val_interval=1)
train_dataloader = dict(
    batch_size=4,
    dataset=dict(
        dataset=dict(
            ann_file='nuscenes_infos_train.pkl',
            box_type_3d='LiDAR',
            data_prefix=dict(
                CAM_BACK='samples/CAM_BACK',
                CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
                CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
                CAM_FRONT='samples/CAM_FRONT',
                CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
                CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
                pts='samples/LIDAR_TOP',
                sweeps='sweeps/LIDAR_TOP'),
            data_root='data/nuscenes/',
            metainfo=dict(classes=[
                'car',
                'truck',
                'construction_vehicle',
                'bus',
                'trailer',
                'barrier',
                'motorcycle',
                'bicycle',
                'pedestrian',
                'traffic_cone',
            ]),
            modality=dict(use_camera=True, use_lidar=True),
            pipeline=[
                dict(
                    backend_args=None,
                    color_type='color',
                    to_float32=True,
                    type='BEVLoadMultiViewImageFromFiles'),
                dict(
                    backend_args=None,
                    coord_type='LIDAR',
                    load_dim=5,
                    type='LoadPointsFromFile',
                    use_dim=5),
                dict(
                    backend_args=None,
                    load_dim=5,
                    pad_empty_sweeps=True,
                    remove_close=True,
                    sweeps_num=9,
                    type='LoadPointsFromMultiSweeps',
                    use_dim=5),
                dict(
                    type='LoadAnnotations3D',
                    with_attr_label=False,
                    with_bbox_3d=True,
                    with_label_3d=True),
                dict(
                    bot_pct_lim=[
                        0.0,
                        0.0,
                    ],
                    final_dim=[
                        256,
                        704,
                    ],
                    is_train=True,
                    rand_flip=True,
                    resize_lim=[
                        0.38,
                        0.55,
                    ],
                    rot_lim=[
                        -5.4,
                        5.4,
                    ],
                    type='ImageAug3D'),
                dict(
                    rot_range=[
                        -0.78539816,
                        0.78539816,
                    ],
                    scale_ratio_range=[
                        0.9,
                        1.1,
                    ],
                    translation_std=0.5,
                    type='BEVFusionGlobalRotScaleTrans'),
                dict(type='BEVFusionRandomFlip3D'),
                dict(
                    point_cloud_range=[
                        -54.0,
                        -54.0,
                        -5.0,
                        54.0,
                        54.0,
                        3.0,
                    ],
                    type='PointsRangeFilter'),
                dict(
                    point_cloud_range=[
                        -54.0,
                        -54.0,
                        -5.0,
                        54.0,
                        54.0,
                        3.0,
                    ],
                    type='ObjectRangeFilter'),
                dict(
                    classes=[
                        'car',
                        'truck',
                        'construction_vehicle',
                        'bus',
                        'trailer',
                        'barrier',
                        'motorcycle',
                        'bicycle',
                        'pedestrian',
                        'traffic_cone',
                    ],
                    type='ObjectNameFilter'),
                dict(
                    fixed_prob=True,
                    max_epoch=6,
                    mode=1,
                    offset=False,
                    prob=0.0,
                    ratio=0.5,
                    rotate=1,
                    type='GridMask',
                    use_h=True,
                    use_w=True),
                dict(type='PointShuffle'),
                dict(
                    keys=[
                        'points',
                        'img',
                        'gt_bboxes_3d',
                        'gt_labels_3d',
                        'gt_bboxes',
                        'gt_labels',
                    ],
                    meta_keys=[
                        'cam2img',
                        'ori_cam2img',
                        'lidar2cam',
                        'lidar2img',
                        'cam2lidar',
                        'ori_lidar2img',
                        'img_aug_matrix',
                        'box_type_3d',
                        'sample_idx',
                        'lidar_path',
                        'img_path',
                        'transformation_3d_flow',
                        'pcd_rotation',
                        'pcd_scale_factor',
                        'pcd_trans',
                        'img_aug_matrix',
                        'lidar_aug_matrix',
                        'num_pts_feats',
                    ],
                    type='Pack3DDetInputs'),
            ],
            test_mode=False,
            type='NuScenesDataset',
            use_valid_flag=True),
        type='CBGSDataset'),
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(
        backend_args=None,
        color_type='color',
        to_float32=True,
        type='BEVLoadMultiViewImageFromFiles'),
    dict(
        backend_args=None,
        coord_type='LIDAR',
        load_dim=5,
        type='LoadPointsFromFile',
        use_dim=5),
    dict(
        backend_args=None,
        load_dim=5,
        pad_empty_sweeps=True,
        remove_close=True,
        sweeps_num=9,
        type='LoadPointsFromMultiSweeps',
        use_dim=5),
    dict(
        type='LoadAnnotations3D',
        with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True),
    dict(
        bot_pct_lim=[
            0.0,
            0.0,
        ],
        final_dim=[
            256,
            704,
        ],
        is_train=True,
        rand_flip=True,
        resize_lim=[
            0.38,
            0.55,
        ],
        rot_lim=[
            -5.4,
            5.4,
        ],
        type='ImageAug3D'),
    dict(
        rot_range=[
            -0.78539816,
            0.78539816,
        ],
        scale_ratio_range=[
            0.9,
            1.1,
        ],
        translation_std=0.5,
        type='BEVFusionGlobalRotScaleTrans'),
    dict(type='BEVFusionRandomFlip3D'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='PointsRangeFilter'),
    dict(
        point_cloud_range=[
            -54.0,
            -54.0,
            -5.0,
            54.0,
            54.0,
            3.0,
        ],
        type='ObjectRangeFilter'),
    dict(
        classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ],
        type='ObjectNameFilter'),
    dict(
        fixed_prob=True,
        max_epoch=6,
        mode=1,
        offset=False,
        prob=0.0,
        ratio=0.5,
        rotate=1,
        type='GridMask',
        use_h=True,
        use_w=True),
    dict(type='PointShuffle'),
    dict(
        keys=[
            'points',
            'img',
            'gt_bboxes_3d',
            'gt_labels_3d',
            'gt_bboxes',
            'gt_labels',
        ],
        meta_keys=[
            'cam2img',
            'ori_cam2img',
            'lidar2cam',
            'lidar2img',
            'cam2lidar',
            'ori_lidar2img',
            'img_aug_matrix',
            'box_type_3d',
            'sample_idx',
            'lidar_path',
            'img_path',
            'transformation_3d_flow',
            'pcd_rotation',
            'pcd_scale_factor',
            'pcd_trans',
            'img_aug_matrix',
            'lidar_aug_matrix',
            'num_pts_feats',
        ],
        type='Pack3DDetInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='nuscenes_infos_val.pkl',
        backend_args=None,
        box_type_3d='LiDAR',
        data_prefix=dict(
            CAM_BACK='samples/CAM_BACK',
            CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
            CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
            CAM_FRONT='samples/CAM_FRONT',
            CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
            CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
            pts='samples/LIDAR_TOP',
            sweeps='sweeps/LIDAR_TOP'),
        data_root='data/nuscenes/',
        metainfo=dict(classes=[
            'car',
            'truck',
            'construction_vehicle',
            'bus',
            'trailer',
            'barrier',
            'motorcycle',
            'bicycle',
            'pedestrian',
            'traffic_cone',
        ]),
        modality=dict(use_camera=True, use_lidar=True),
        pipeline=[
            dict(
                backend_args=None,
                color_type='color',
                to_float32=True,
                type='BEVLoadMultiViewImageFromFiles'),
            dict(
                backend_args=None,
                coord_type='LIDAR',
                load_dim=5,
                type='LoadPointsFromFile',
                use_dim=5),
            dict(
                backend_args=None,
                load_dim=5,
                pad_empty_sweeps=True,
                remove_close=True,
                sweeps_num=9,
                type='LoadPointsFromMultiSweeps',
                use_dim=5),
            dict(
                bot_pct_lim=[
                    0.0,
                    0.0,
                ],
                final_dim=[
                    256,
                    704,
                ],
                is_train=False,
                rand_flip=False,
                resize_lim=[
                    0.48,
                    0.48,
                ],
                rot_lim=[
                    0.0,
                    0.0,
                ],
                type='ImageAug3D'),
            dict(
                point_cloud_range=[
                    -54.0,
                    -54.0,
                    -5.0,
                    54.0,
                    54.0,
                    3.0,
                ],
                type='PointsRangeFilter'),
            dict(
                keys=[
                    'img',
                    'points',
                    'gt_bboxes_3d',
                    'gt_labels_3d',
                ],
                meta_keys=[
                    'cam2img',
                    'ori_cam2img',
                    'lidar2cam',
                    'lidar2img',
                    'cam2lidar',
                    'ori_lidar2img',
                    'img_aug_matrix',
                    'box_type_3d',
                    'sample_idx',
                    'lidar_path',
                    'img_path',
                    'num_pts_feats',
                ],
                type='Pack3DDetInputs'),
        ],
        test_mode=True,
        type='NuScenesDataset'),
    drop_last=False,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    ann_file='data/nuscenes/nuscenes_infos_val.pkl',
    backend_args=None,
    data_root='data/nuscenes/',
    metric='bbox',
    type='NuScenesMetric')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='Det3DLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
voxel_size = [
    0.075,
    0.075,
    0.2,
]
work_dir = './work_dirs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d'

04/22 12:07:58 - mmengine - INFO - Loads checkpoint by http backend from path: https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
04/22 12:08:02 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
04/22 12:08:02 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
04/22 12:08:10 - mmengine - INFO - ------------------------------
04/22 12:08:10 - mmengine - INFO - The length of test dataset: 6019
04/22 12:08:10 - mmengine - INFO - The number of instances per category in the dataset:
+----------------------+--------+
| category             | number |
+----------------------+--------+
| car                  | 80004  |
| truck                | 15704  |
| construction_vehicle | 2678   |
| bus                  | 3158   |
| trailer              | 4159   |
| barrier              | 26992  |
| motorcycle           | 2508   |
| bicycle              | 2381   |
| pedestrian           | 34347  |
| traffic_cone         | 15597  |
+----------------------+--------+
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmdet/models/task_modules/builder.py:17: UserWarning: ``build_sampler`` would be deprecated soon, please use ``mmdet.registry.TASK_UTILS.build()`` 
  warnings.warn('``build_sampler`` would be deprecated soon, please use '
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmdet/models/task_modules/builder.py:39: UserWarning: ``build_assigner`` would be deprecated soon, please use ``mmdet.registry.TASK_UTILS.build()`` 
  warnings.warn('``build_assigner`` would be deprecated soon, please use '
/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541702/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "tools/test.py", line 149, in <module>
    main()
  File "tools/test.py", line 145, in main
    runner.test()
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1816, in test
    self._test_loop = self.build_test_loop(self._test_loop)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1611, in build_test_loop
    loop = TestLoop(
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/loops.py", line 413, in __init__
    self.evaluator = runner.build_evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1318, in build_evaluator
    return Evaluator(evaluator)  # type: ignore
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 25, in __init__
    self.metrics.append(METRICS.build(metric))
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/apurvabadithela/miniconda3/envs/detection/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
TypeError: __init__() got an unexpected keyword argument 'pklfile_prefix'
VeeranjaneyuluToka commented 7 months ago

I am wondering what would happen if you just try without .pkl file with below command, does it saves results in some format?

python tools/test.py projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py checkpoints/bevfusion_converted.pth --task 'multi-modality_det'

abadithela commented 7 months ago

@VeeranjaneyuluToka Yes, I've tried that, but it does not save individual prediction boxes --- it just creates a .json file with the standard metrics and a data/ folder with visualizations of the bounding boxes. I need the predicted boxes for my analysis.

VeeranajenyuluT commented 7 months ago

I have created my own inference runner based on their demo samples (https://github.com/open-mmlab/mmdetection3d/tree/main/demo), there is a way to visualize and dump the predictions, however i am working on LiDAR based 3D detection only. But it should work even in multi-modality case also i believe, so i would recommend to look into it.

abadithela commented 6 months ago

Hi @VeeranjaneyuluToka: we did the same for just Lidar 3D detector. But based on the demos, doing this for multi-modality did not work. If you look at the multi-modality demo, it requires each point cloud and all associated images for that sample to be in one folder. I'm not sure how to scale this up and run inference for the entire dataset, especially with BEVFusion.

gorkemguzeler commented 1 month ago

Hey @abadithela , did you come up with any ideas/solutions for the regarding issue?