open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5k stars 1.49k forks source link

[Bug] Saving results using a pretrained PETR model #2968

Open Ortega00 opened 1 month ago

Ortega00 commented 1 month ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: False MUSA available: False numpy_random_seed: 2147483648 GCC: n/a PyTorch: 2.1.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.16.0 OpenCV: 4.9.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+962f093 spconv2.0: False

Reproduces the problem - code sample

Link to the code that produces the error

Reproduces the problem - command or script

python tools/test.py projects/PETR/configs/petr_vovnet_gridmask_p4_800x320.py checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth --show --show-dir results --task multi-view_det

Reproduces the problem - error message

python tools/test.py projects/PETR/configs/petr_vovnet_gridmask_p4_800x320.py checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth --show --show-dir results --task multi-view_det /bin/sh: 1: gcc: not found 05/01 01:05:09 - mmengine - INFO -

System environment: sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: False MUSA available: False numpy_random_seed: 1 GCC: n/a PyTorch: 2.1.0 PyTorch compiling details: PyTorch built with:

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1 deterministic: False diff_rank_seed: False Distributed launcher: none Distributed training: False GPU number: 1

05/01 01:05:10 - mmengine - INFO - Config: auto_scale_lr = dict(base_batch_size=32, enable=False) backbone_norm_cfg = dict(requires_grad=True, type='LN') backend_args = None class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ] custom_imports = dict(imports=[ 'projects.PETR.petr', ]) data_prefix = dict(img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP') data_root = 'data/nuscenes/' dataset_type = 'NuScenesDataset' db_sampler = dict( backend_args=None, classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], data_root='data/nuscenes/', info_path='data/nuscenes/nuscenes_dbinfos_train.pkl', points_loader=dict( backend_args=None, coord_type='LIDAR', load_dim=5, type='LoadPointsFromFile', use_dim=[ 0, 1, 2, 3, 4, ]), prepare=dict( filter_by_difficulty=[ -1, ], filter_by_min_points=dict( barrier=5, bicycle=5, bus=5, car=5, construction_vehicle=5, motorcycle=5, pedestrian=5, traffic_cone=5, trailer=5, truck=5)), rate=1.0, sample_groups=dict( barrier=2, bicycle=6, bus=4, car=2, construction_vehicle=7, motorcycle=6, pedestrian=2, traffic_cone=2, trailer=6, truck=3)) default_hooks = dict( checkpoint=dict(interval=-1, type='CheckpointHook'), logger=dict(interval=50, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict( draw=True, score_thr=0.1, show=True, test_out_dir='results', type='Det3DVisualizationHook', vis_task='multi-view_det', wait_time=2)) default_scope = 'mmdet3d' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) eval_pipeline = [ dict( backend_args=None, coord_type='LIDAR', load_dim=5, type='LoadPointsFromFile', use_dim=5), dict( backend_args=None, sweeps_num=10, test_mode=True, type='LoadPointsFromMultiSweeps'), dict(keys=[ 'points', ], type='Pack3DDetInputs'), ] find_unused_parameters = False ida_aug_conf = dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )) img_norm_cfg = dict( mean=[ 103.53, 116.28, 123.675, ], std=[ 57.375, 57.12, 58.395, ], to_rgb=False) input_modality = dict(use_camera=True, use_lidar=True) launcher = 'none' load_from = 'checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth' log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50) lr = 0.0001 metainfo = dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]) model = dict( data_preprocessor=dict( bgr_to_rgb=False, mean=[ 103.53, 116.28, 123.675, ], pad_size_divisor=32, std=[ 57.375, 57.12, 58.395, ], type='Det3DDataPreprocessor'), img_backbone=dict( frozen_stages=-1, input_ch=3, norm_eval=True, out_features=( 'stage4', 'stage5', ), spec_name='V-99-eSE', type='VoVNetCP'), img_neck=dict( in_channels=[ 768, 1024, ], num_outs=2, out_channels=256, type='CPFPN'), pts_bbox_head=dict( LID=True, bbox_coder=dict( max_num=300, num_classes=10, pc_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], post_center_range=[ -61.2, -61.2, -10.0, 61.2, 61.2, 10.0, ], type='NMSFreeCoder', voxel_size=[ 0.2, 0.2, 8, ]), in_channels=256, loss_bbox=dict(loss_weight=0.25, type='mmdet.L1Loss'), loss_cls=dict( alpha=0.25, gamma=2.0, loss_weight=2.0, type='mmdet.FocalLoss', use_sigmoid=True), loss_iou=dict(loss_weight=0.0, type='mmdet.GIoULoss'), normedlinear=False, num_classes=10, num_query=900, position_range=[ -61.2, -61.2, -10.0, 61.2, 61.2, 10.0, ], positional_encoding=dict( normalize=True, num_feats=128, type='SinePositionalEncoding3D'), transformer=dict( decoder=dict( num_layers=6, return_intermediate=True, transformerlayers=dict( attn_cfgs=[ dict( attn_drop=0.1, dropout_layer=dict(drop_prob=0.1, type='Dropout'), embed_dims=256, num_heads=8, type='MultiheadAttention'), dict( attn_drop=0.1, dropout_layer=dict(drop_prob=0.1, type='Dropout'), embed_dims=256, num_heads=8, type='PETRMultiheadAttention'), ], feedforward_channels=2048, ffn_dropout=0.1, operation_order=( 'self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm', ), type='PETRTransformerDecoderLayer'), type='PETRTransformerDecoder'), type='PETRTransformer'), type='PETRHead', with_multiview=True, with_position=True), train_cfg=dict( pts=dict( assigner=dict( cls_cost=dict(type='FocalLossCost', weight=2.0), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], reg_cost=dict(type='BBox3DL1Cost', weight=0.25), type='HungarianAssigner3D'), grid_size=[ 512, 512, 1, ], out_size_factor=4, point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], voxel_size=[ 0.2, 0.2, 8, ])), type='PETR', use_grid_mask=True) num_epochs = 24 optim_wrapper = dict( clip_grad=dict(max_norm=35, norm_type=2), optimizer=dict(lr=0.0002, type='AdamW', weight_decay=0.01), paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.1))), type='OptimWrapper') param_scheduler = [ dict( begin=0, by_epoch=False, end=500, start_factor=0.3333333333333333, type='LinearLR'), dict(T_max=24, by_epoch=True, type='CosineAnnealingLR'), ] point_cloud_range = [ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ] randomness = dict(deterministic=False, diff_rank_seed=False, seed=1) resume = False test_cfg = dict() test_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_val.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ], test_mode=True, type='NuScenesDataset', use_valid_flag=True), drop_last=False, num_workers=1, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( ann_file='data/nuscenes/nuscenes_infos_val.pkl', backend_args=None, data_root='data/nuscenes/', metric='bbox', type='NuScenesMetric') test_pipeline = [ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ] train_cfg = dict(by_epoch=True, max_epochs=24, val_interval=24) train_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_train.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( type='LoadAnnotations3D', with_attr_label=False, with_bbox_3d=True, with_label_3d=True), dict( point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], type='ObjectRangeFilter'), dict( classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], type='ObjectNameFilter'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=True, type='ResizeCropFlipImage'), dict( reverse_angle=False, rot_range=[ -0.3925, 0.3925, ], scale_ratio_range=[ 0.95, 1.05, ], training=True, translation_std=[ 0, 0, 0, ], type='GlobalRotScaleTransImage'), dict( keys=[ 'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers_2d', 'depths', ], type='Pack3DDetInputs'), ], test_mode=False, type='NuScenesDataset', use_valid_flag=True), num_workers=4, persistent_workers=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( type='LoadAnnotations3D', with_attr_label=False, with_bbox_3d=True, with_label_3d=True), dict( point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], type='ObjectRangeFilter'), dict( classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], type='ObjectNameFilter'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=True, type='ResizeCropFlipImage'), dict( reverse_angle=False, rot_range=[ -0.3925, 0.3925, ], scale_ratio_range=[ 0.95, 1.05, ], training=True, translation_std=[ 0, 0, 0, ], type='GlobalRotScaleTransImage'), dict( keys=[ 'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers_2d', 'depths', ], type='Pack3DDetInputs'), ] val_cfg = dict() val_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_val.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ], test_mode=True, type='NuScenesDataset', use_valid_flag=True), drop_last=False, num_workers=1, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( ann_file='data/nuscenes/nuscenes_infos_val.pkl', backend_args=None, data_root='data/nuscenes/', metric='bbox', type='NuScenesMetric') vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='Det3DLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) voxel_size = [ 0.2, 0.2, 8, ] work_dir = './work_dirs/petr_vovnet_gridmask_p4_800x320'

05/01 01:05:13 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used. /home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py:75: UserWarning: The show is True, it means that only the prediction results are visualized without storing data, so vis_backends needs to be excluded. warnings.warn('The show is True, it means that only ' 05/01 01:05:13 - mmengine - INFO - Autoplay mode, press [SPACE] to pause. 05/01 01:05:13 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook


before_train: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook


before_train_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook


before_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook


after_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook


after_train_epoch: (NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook


before_val: (VERY_HIGH ) RuntimeInfoHook


before_val_epoch: (NORMAL ) IterTimerHook


before_val_iter: (NORMAL ) IterTimerHook


after_val_iter: (NORMAL ) IterTimerHook
(NORMAL ) Det3DVisualizationHook
(BELOW_NORMAL) LoggerHook


after_val_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook


after_val: (VERY_HIGH ) RuntimeInfoHook


after_train: (VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook


before_test: (VERY_HIGH ) RuntimeInfoHook


before_test_epoch: (NORMAL ) IterTimerHook


before_test_iter: (NORMAL ) IterTimerHook


after_test_iter: (NORMAL ) IterTimerHook
(NORMAL ) Det3DVisualizationHook
(BELOW_NORMAL) LoggerHook


after_test_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook


after_test: (VERY_HIGH ) RuntimeInfoHook


after_run: (BELOW_NORMAL) LoggerHook


05/01 01:05:27 - mmengine - INFO - ------------------------------ 05/01 01:05:27 - mmengine - INFO - The length of test dataset: 6019 05/01 01:05:27 - mmengine - INFO - The number of instances per category in the dataset: +----------------------+--------+ | category | number | +----------------------+--------+ | car | 80004 | | truck | 15704 | | construction_vehicle | 2678 | | bus | 3158 | | trailer | 4159 | | barrier | 26992 | | motorcycle | 2508 | | bicycle | 2381 | | pedestrian | 34347 | | traffic_cone | 15597 | +----------------------+--------+ /home/user/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41): Loads checkpoint by local backend from path: checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth 05/01 01:05:28 - mmengine - INFO - Load checkpoint from checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth /home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695391896527/work/aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "tools/test.py", line 149, in main() File "tools/test.py", line 145, in main runner.test() File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1823, in test metrics = self.test_loop.run() # type: ignore File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 445, in run self.run_iter(idx, data_batch) File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 466, in run_iter self.runner.call_hook( File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook getattr(hook, fn_name)(self, kwargs) File "/home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py", line 228, in after_test_iter self._visualizer.add_datasample( File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper return func(args, kwargs) File "/home/user/mmdetection3d/mmdet3d/visualization/local_visualizer.py", line 1034, in add_datasample pred_instances_3d = pred_instances_3d[ File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/structures/instance_data.py", line 201, in getitem assert len(item) == len(self), 'The shape of the ' \ AssertionError: The shape of the input(BoolTensor) 300 does not match the shape of the indexed tensor in results_field 0 at first dimension.

Additional information

I tried to use a pretrained PETR model for testing/inference to produce bounding boxes for the NuScenes dataset. Because I don't have a GUI available, I want to save them in a file. Using the default PETR config and the checkpoint available here, I was unable to save the results. The testing process itself worked fine, printing out the metrics at the end (without saving the results, so without using --save-dir..). I also tried to use the Inference API to create bounding boxes for a sample image, however, the API doesn't seem to support multi-view 3D detection.

Any recommendations on how to properly save bounding boxes/results from a pretrained PETR model?