05/01 01:05:13 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
/home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py:75: UserWarning: The show is True, it means that only the prediction results are visualized without storing data, so vis_backends needs to be excluded.
warnings.warn('The show is True, it means that only '
05/01 01:05:13 - mmengine - INFO - Autoplay mode, press [SPACE] to pause.
05/01 01:05:13 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
05/01 01:05:27 - mmengine - INFO - ------------------------------
05/01 01:05:27 - mmengine - INFO - The length of test dataset: 6019
05/01 01:05:27 - mmengine - INFO - The number of instances per category in the dataset:
+----------------------+--------+
| category | number |
+----------------------+--------+
| car | 80004 |
| truck | 15704 |
| construction_vehicle | 2678 |
| bus | 3158 |
| trailer | 4159 |
| barrier | 26992 |
| motorcycle | 2508 |
| bicycle | 2381 |
| pedestrian | 34347 |
| traffic_cone | 15597 |
+----------------------+--------+
/home/user/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41):
Loads checkpoint by local backend from path: checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth
05/01 01:05:28 - mmengine - INFO - Load checkpoint from checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth
/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695391896527/work/aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "tools/test.py", line 149, in
main()
File "tools/test.py", line 145, in main
runner.test()
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1823, in test
metrics = self.test_loop.run() # type: ignore
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 445, in run
self.run_iter(idx, data_batch)
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 466, in run_iter
self.runner.call_hook(
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, kwargs)
File "/home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py", line 228, in after_test_iter
self._visualizer.add_datasample(
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(args, kwargs)
File "/home/user/mmdetection3d/mmdet3d/visualization/local_visualizer.py", line 1034, in add_datasample
pred_instances_3d = pred_instances_3d[
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/structures/instance_data.py", line 201, in getitem
assert len(item) == len(self), 'The shape of the ' \
AssertionError: The shape of the input(BoolTensor) 300 does not match the shape of the indexed tensor in results_field 0 at first dimension.
Additional information
I tried to use a pretrained PETR model for testing/inference to produce bounding boxes for the NuScenes dataset. Because I don't have a GUI available, I want to save them in a file. Using the default PETR config and the checkpoint available here, I was unable to save the results. The testing process itself worked fine, printing out the metrics at the end (without saving the results, so without using --save-dir..). I also tried to use the Inference API to create bounding boxes for a sample image, however, the API doesn't seem to support multi-view 3D detection.
Any recommendations on how to properly save bounding boxes/results from a pretrained PETR model?
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: False MUSA available: False numpy_random_seed: 2147483648 GCC: n/a PyTorch: 2.1.0 PyTorch compiling details: PyTorch built with:
TorchVision: 0.16.0 OpenCV: 4.9.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+962f093 spconv2.0: False
Reproduces the problem - code sample
Link to the code that produces the error
Reproduces the problem - command or script
python tools/test.py projects/PETR/configs/petr_vovnet_gridmask_p4_800x320.py checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth --show --show-dir results --task multi-view_det
Reproduces the problem - error message
python tools/test.py projects/PETR/configs/petr_vovnet_gridmask_p4_800x320.py checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth --show --show-dir results --task multi-view_det /bin/sh: 1: gcc: not found 05/01 01:05:09 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: False MUSA available: False numpy_random_seed: 1 GCC: n/a PyTorch: 2.1.0 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.16.0 OpenCV: 4.9.0 MMEngine: 0.10.4
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1 deterministic: False diff_rank_seed: False Distributed launcher: none Distributed training: False GPU number: 1
05/01 01:05:10 - mmengine - INFO - Config: auto_scale_lr = dict(base_batch_size=32, enable=False) backbone_norm_cfg = dict(requires_grad=True, type='LN') backend_args = None class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ] custom_imports = dict(imports=[ 'projects.PETR.petr', ]) data_prefix = dict(img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP') data_root = 'data/nuscenes/' dataset_type = 'NuScenesDataset' db_sampler = dict( backend_args=None, classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], data_root='data/nuscenes/', info_path='data/nuscenes/nuscenes_dbinfos_train.pkl', points_loader=dict( backend_args=None, coord_type='LIDAR', load_dim=5, type='LoadPointsFromFile', use_dim=[ 0, 1, 2, 3, 4, ]), prepare=dict( filter_by_difficulty=[ -1, ], filter_by_min_points=dict( barrier=5, bicycle=5, bus=5, car=5, construction_vehicle=5, motorcycle=5, pedestrian=5, traffic_cone=5, trailer=5, truck=5)), rate=1.0, sample_groups=dict( barrier=2, bicycle=6, bus=4, car=2, construction_vehicle=7, motorcycle=6, pedestrian=2, traffic_cone=2, trailer=6, truck=3)) default_hooks = dict( checkpoint=dict(interval=-1, type='CheckpointHook'), logger=dict(interval=50, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict( draw=True, score_thr=0.1, show=True, test_out_dir='results', type='Det3DVisualizationHook', vis_task='multi-view_det', wait_time=2)) default_scope = 'mmdet3d' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) eval_pipeline = [ dict( backend_args=None, coord_type='LIDAR', load_dim=5, type='LoadPointsFromFile', use_dim=5), dict( backend_args=None, sweeps_num=10, test_mode=True, type='LoadPointsFromMultiSweeps'), dict(keys=[ 'points', ], type='Pack3DDetInputs'), ] find_unused_parameters = False ida_aug_conf = dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )) img_norm_cfg = dict( mean=[ 103.53, 116.28, 123.675, ], std=[ 57.375, 57.12, 58.395, ], to_rgb=False) input_modality = dict(use_camera=True, use_lidar=True) launcher = 'none' load_from = 'checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth' log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50) lr = 0.0001 metainfo = dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]) model = dict( data_preprocessor=dict( bgr_to_rgb=False, mean=[ 103.53, 116.28, 123.675, ], pad_size_divisor=32, std=[ 57.375, 57.12, 58.395, ], type='Det3DDataPreprocessor'), img_backbone=dict( frozen_stages=-1, input_ch=3, norm_eval=True, out_features=( 'stage4', 'stage5', ), spec_name='V-99-eSE', type='VoVNetCP'), img_neck=dict( in_channels=[ 768, 1024, ], num_outs=2, out_channels=256, type='CPFPN'), pts_bbox_head=dict( LID=True, bbox_coder=dict( max_num=300, num_classes=10, pc_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], post_center_range=[ -61.2, -61.2, -10.0, 61.2, 61.2, 10.0, ], type='NMSFreeCoder', voxel_size=[ 0.2, 0.2, 8, ]), in_channels=256, loss_bbox=dict(loss_weight=0.25, type='mmdet.L1Loss'), loss_cls=dict( alpha=0.25, gamma=2.0, loss_weight=2.0, type='mmdet.FocalLoss', use_sigmoid=True), loss_iou=dict(loss_weight=0.0, type='mmdet.GIoULoss'), normedlinear=False, num_classes=10, num_query=900, position_range=[ -61.2, -61.2, -10.0, 61.2, 61.2, 10.0, ], positional_encoding=dict( normalize=True, num_feats=128, type='SinePositionalEncoding3D'), transformer=dict( decoder=dict( num_layers=6, return_intermediate=True, transformerlayers=dict( attn_cfgs=[ dict( attn_drop=0.1, dropout_layer=dict(drop_prob=0.1, type='Dropout'), embed_dims=256, num_heads=8, type='MultiheadAttention'), dict( attn_drop=0.1, dropout_layer=dict(drop_prob=0.1, type='Dropout'), embed_dims=256, num_heads=8, type='PETRMultiheadAttention'), ], feedforward_channels=2048, ffn_dropout=0.1, operation_order=( 'self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm', ), type='PETRTransformerDecoderLayer'), type='PETRTransformerDecoder'), type='PETRTransformer'), type='PETRHead', with_multiview=True, with_position=True), train_cfg=dict( pts=dict( assigner=dict( cls_cost=dict(type='FocalLossCost', weight=2.0), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], reg_cost=dict(type='BBox3DL1Cost', weight=0.25), type='HungarianAssigner3D'), grid_size=[ 512, 512, 1, ], out_size_factor=4, point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], voxel_size=[ 0.2, 0.2, 8, ])), type='PETR', use_grid_mask=True) num_epochs = 24 optim_wrapper = dict( clip_grad=dict(max_norm=35, norm_type=2), optimizer=dict(lr=0.0002, type='AdamW', weight_decay=0.01), paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.1))), type='OptimWrapper') param_scheduler = [ dict( begin=0, by_epoch=False, end=500, start_factor=0.3333333333333333, type='LinearLR'), dict(T_max=24, by_epoch=True, type='CosineAnnealingLR'), ] point_cloud_range = [ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ] randomness = dict(deterministic=False, diff_rank_seed=False, seed=1) resume = False test_cfg = dict() test_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_val.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ], test_mode=True, type='NuScenesDataset', use_valid_flag=True), drop_last=False, num_workers=1, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( ann_file='data/nuscenes/nuscenes_infos_val.pkl', backend_args=None, data_root='data/nuscenes/', metric='bbox', type='NuScenesMetric') test_pipeline = [ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ] train_cfg = dict(by_epoch=True, max_epochs=24, val_interval=24) train_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_train.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( type='LoadAnnotations3D', with_attr_label=False, with_bbox_3d=True, with_label_3d=True), dict( point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], type='ObjectRangeFilter'), dict( classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], type='ObjectNameFilter'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=True, type='ResizeCropFlipImage'), dict( reverse_angle=False, rot_range=[ -0.3925, 0.3925, ], scale_ratio_range=[ 0.95, 1.05, ], training=True, translation_std=[ 0, 0, 0, ], type='GlobalRotScaleTransImage'), dict( keys=[ 'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers_2d', 'depths', ], type='Pack3DDetInputs'), ], test_mode=False, type='NuScenesDataset', use_valid_flag=True), num_workers=4, persistent_workers=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( type='LoadAnnotations3D', with_attr_label=False, with_bbox_3d=True, with_label_3d=True), dict( point_cloud_range=[ -51.2, -51.2, -5.0, 51.2, 51.2, 3.0, ], type='ObjectRangeFilter'), dict( classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ], type='ObjectNameFilter'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=True, type='ResizeCropFlipImage'), dict( reverse_angle=False, rot_range=[ -0.3925, 0.3925, ], scale_ratio_range=[ 0.95, 1.05, ], training=True, translation_std=[ 0, 0, 0, ], type='GlobalRotScaleTransImage'), dict( keys=[ 'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers_2d', 'depths', ], type='Pack3DDetInputs'), ] val_cfg = dict() val_dataloader = dict( batch_size=1, dataset=dict( ann_file='nuscenes_infos_val.pkl', backend_args=None, box_type_3d='LiDAR', data_prefix=dict( CAM_BACK='samples/CAM_BACK', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', img='', pts='samples/LIDAR_TOP', sweeps='sweeps/LIDAR_TOP'), data_root='data/nuscenes/', metainfo=dict(classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', ]), modality=dict(use_camera=True, use_lidar=True), pipeline=[ dict( backend_args=None, to_float32=True, type='LoadMultiViewImageFromFiles'), dict( data_aug_conf=dict( H=900, W=1600, bot_pct_lim=( 0.0, 0.0, ), final_dim=( 320, 800, ), rand_flip=True, resize_lim=( 0.47, 0.625, ), rot_lim=( 0.0, 0.0, )), training=False, type='ResizeCropFlipImage'), dict(keys=[ 'img', ], type='Pack3DDetInputs'), ], test_mode=True, type='NuScenesDataset', use_valid_flag=True), drop_last=False, num_workers=1, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( ann_file='data/nuscenes/nuscenes_infos_val.pkl', backend_args=None, data_root='data/nuscenes/', metric='bbox', type='NuScenesMetric') vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='Det3DLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) voxel_size = [ 0.2, 0.2, 8, ] work_dir = './work_dirs/petr_vovnet_gridmask_p4_800x320'
05/01 01:05:13 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used. /home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py:75: UserWarning: The show is True, it means that only the prediction results are visualized without storing data, so vis_backends needs to be excluded. warnings.warn('The show is True, it means that only ' 05/01 01:05:13 - mmengine - INFO - Autoplay mode, press [SPACE] to pause. 05/01 01:05:13 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
before_train: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
before_train_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
before_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
after_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
after_train_epoch: (NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_val: (VERY_HIGH ) RuntimeInfoHook
before_val_epoch: (NORMAL ) IterTimerHook
before_val_iter: (NORMAL ) IterTimerHook
after_val_iter: (NORMAL ) IterTimerHook
(NORMAL ) Det3DVisualizationHook
(BELOW_NORMAL) LoggerHook
after_val_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
after_val: (VERY_HIGH ) RuntimeInfoHook
after_train: (VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook
before_test: (VERY_HIGH ) RuntimeInfoHook
before_test_epoch: (NORMAL ) IterTimerHook
before_test_iter: (NORMAL ) IterTimerHook
after_test_iter: (NORMAL ) IterTimerHook
(NORMAL ) Det3DVisualizationHook
(BELOW_NORMAL) LoggerHook
after_test_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_test: (VERY_HIGH ) RuntimeInfoHook
after_run: (BELOW_NORMAL) LoggerHook
05/01 01:05:27 - mmengine - INFO - ------------------------------ 05/01 01:05:27 - mmengine - INFO - The length of test dataset: 6019 05/01 01:05:27 - mmengine - INFO - The number of instances per category in the dataset: +----------------------+--------+ | category | number | +----------------------+--------+ | car | 80004 | | truck | 15704 | | construction_vehicle | 2678 | | bus | 3158 | | trailer | 4159 | | barrier | 26992 | | motorcycle | 2508 | | bicycle | 2381 | | pedestrian | 34347 | | traffic_cone | 15597 | +----------------------+--------+ /home/user/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41): Loads checkpoint by local backend from path: checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth 05/01 01:05:28 - mmengine - INFO - Load checkpoint from checkpoints/petr_vovnet_gridmask_p4_800x320-e2191752.pth /home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695391896527/work/aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "tools/test.py", line 149, in
main()
File "tools/test.py", line 145, in main
runner.test()
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1823, in test
metrics = self.test_loop.run() # type: ignore
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 445, in run
self.run_iter(idx, data_batch)
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/loops.py", line 466, in run_iter
self.runner.call_hook(
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, kwargs)
File "/home/user/mmdetection3d/mmdet3d/engine/hooks/visualization_hook.py", line 228, in after_test_iter
self._visualizer.add_datasample(
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(args, kwargs)
File "/home/user/mmdetection3d/mmdet3d/visualization/local_visualizer.py", line 1034, in add_datasample
pred_instances_3d = pred_instances_3d[
File "/home/user/miniconda3/envs/venv/lib/python3.8/site-packages/mmengine/structures/instance_data.py", line 201, in getitem
assert len(item) == len(self), 'The shape of the ' \
AssertionError: The shape of the input(BoolTensor) 300 does not match the shape of the indexed tensor in results_field 0 at first dimension.
Additional information
I tried to use a pretrained PETR model for testing/inference to produce bounding boxes for the NuScenes dataset. Because I don't have a GUI available, I want to save them in a file. Using the default PETR config and the checkpoint available here, I was unable to save the results. The testing process itself worked fine, printing out the metrics at the end (without saving the results, so without using --save-dir..). I also tried to use the Inference API to create bounding boxes for a sample image, however, the API doesn't seem to support multi-view 3D detection.
Any recommendations on how to properly save bounding boxes/results from a pretrained PETR model?