open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.34k stars 1.55k forks source link

[Bug] Problem when training PV-RCNN with Waymo database: Error during evaluation after first epoch #3027

Open JuanMisas26 opened 3 months ago

JuanMisas26 commented 3 months ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.2, V12.2.140 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 2.3.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.18.1 OpenCV: 4.10.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+962f093 spconv2.0: False

Reproduces the problem - code sample

base = [ '../base/datasets/waymoD5-3d-3class.py', '../base/schedules/cyclic-40e.py', '../base/default_runtime.py' ]

voxel_size = [0.05, 0.05, 0.1] point_cloud_range = [0, -40, -3, 70.4, 40, 1]

data_root = 'data/waymo/kitti_format/' class_names = ['Pedestrian', 'Cyclist', 'Car'] metainfo = dict(CLASSES=class_names) backend_args = None db_sampler = dict( data_root=data_root, info_path=data_root + 'waymo_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)), classes=class_names, sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=6, use_dim=4, backend_args=backend_args), backend_args=backend_args)

train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=6, use_dim=4, backend_args=backend_args), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True), dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=False), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict( type='GlobalRotScaleTrans', rot_range=[-0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05]), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) ] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=6, use_dim=4, backend_args=backend_args), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1., 1.], translation_std=[0, 0, 0]), dict(type='RandomFlip3D'), dict( type='PointsRangeFilter', point_cloud_range=point_cloud_range) ]), dict(type='Pack3DDetInputs', keys=['points']) ]

model = dict( type='PointVoxelRCNN', data_preprocessor=dict( type='Det3DDataPreprocessor', voxel=True, voxel_layer=dict( max_num_points=5, # max_points_per_voxel point_cloud_range=point_cloud_range, voxel_size=voxel_size, max_voxels=(16000, 40000))), voxel_encoder=dict(type='HardSimpleVFE'), middle_encoder=dict( type='SparseEncoder', in_channels=4, sparse_shape=[41, 1600, 1408], order=('conv', 'norm', 'act'), encoder_paddings=((0, 0, 0), ((1, 1, 1), 0, 0), ((1, 1, 1), 0, 0), ((0, 1, 1), 0, 0)), return_middle_feats=True), points_encoder=dict( type='VoxelSetAbstraction', num_keypoints=2048, fused_out_channel=128, voxel_size=voxel_size, point_cloud_range=point_cloud_range, voxel_sa_cfgs_list=[ dict( type='StackedSAModuleMSG', in_channels=16, scale_factor=1, radius=(0.4, 0.8), sample_nums=(16, 16), mlp_channels=((16, 16), (16, 16)), use_xyz=True), dict( type='StackedSAModuleMSG', in_channels=32, scale_factor=2, radius=(0.8, 1.2), sample_nums=(16, 32), mlp_channels=((32, 32), (32, 32)), use_xyz=True), dict( type='StackedSAModuleMSG', in_channels=64, scale_factor=4, radius=(1.2, 2.4), sample_nums=(16, 32), mlp_channels=((64, 64), (64, 64)), use_xyz=True), dict( type='StackedSAModuleMSG', in_channels=64, scale_factor=8, radius=(2.4, 4.8), sample_nums=(16, 32), mlp_channels=((64, 64), (64, 64)), use_xyz=True) ], rawpoints_sa_cfgs=dict( type='StackedSAModuleMSG', in_channels=1, radius=(0.4, 0.8), sample_nums=(16, 16), mlp_channels=((16, 16), (16, 16)), use_xyz=True), bev_feat_channel=256, bev_scale_factor=8), backbone=dict( type='SECOND', in_channels=256, layer_nums=[5, 5], layer_strides=[1, 2], out_channels=[128, 256]), neck=dict( type='SECONDFPN', in_channels=[128, 256], upsample_strides=[1, 2], out_channels=[256, 256]), rpn_head=dict( type='PartA2RPNHead', num_classes=3, in_channels=512, feat_channels=512, use_direction_classifier=True, dir_offset=0.78539, anchor_generator=dict( type='Anchor3DRangeGenerator', ranges=[[0, -40.0, -0.6, 70.4, 40.0, -0.6], [0, -40.0, -0.6, 70.4, 40.0, -0.6], [0, -40.0, -1.78, 70.4, 40.0, -1.78]], sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]], rotations=[0, 1.57], reshape_out=False), diff_rad_by_sin=True, assigner_per_size=True, assign_per_class=True, bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'), loss_cls=dict( type='mmdet.FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict( type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0), loss_dir=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)), roi_head=dict( type='PVRCNNRoiHead', num_classes=3, semantic_head=dict( type='ForegroundSegmentationHead', in_channels=640, extra_width=0.1, loss_seg=dict( type='mmdet.FocalLoss', use_sigmoid=True, reduction='sum', gamma=2.0, alpha=0.25, activated=True, loss_weight=1.0)), bbox_roi_extractor=dict( type='Batch3DRoIGridExtractor', grid_size=6, roi_layer=dict( type='StackedSAModuleMSG', in_channels=128, radius=(0.8, 1.6), sample_nums=(16, 16), mlp_channels=((64, 64), (64, 64)), use_xyz=True, pool_mod='max'), ), bbox_head=dict( type='PVRCNNBBoxHead', in_channels=128, grid_size=6, num_classes=3, class_agnostic=True, shared_fc_channels=(256, 256), reg_channels=(256, 256), cls_channels=(256, 256), dropout_ratio=0.3, with_corner_loss=True, bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'), loss_bbox=dict( type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, reduction='sum', loss_weight=1.0), loss_cls=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=True, reduction='sum', loss_weight=1.0))),

model training and testing settings

train_cfg=dict(
    rpn=dict(
        assigner=[
            dict(  # for Pedestrian
                type='Max3DIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(  # for Cyclist
                type='Max3DIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(  # for Car
                type='Max3DIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.6,
                neg_iou_thr=0.45,
                min_pos_iou=0.45,
                ignore_iof_thr=-1)
        ],
        allowed_border=0,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_pre=9000,
        nms_post=512,
        max_num=512,
        nms_thr=0.8,
        score_thr=0,
        use_rotate_nms=True),
    rcnn=dict(
        assigner=[
            dict(  # for Pedestrian
                type='Max3DIoUAssigner',
                iou_calculator=dict(
                    type='BboxOverlaps3D', coordinate='lidar'),
                pos_iou_thr=0.55,
                neg_iou_thr=0.55,
                min_pos_iou=0.55,
                ignore_iof_thr=-1),
            dict(  # for Cyclist
                type='Max3DIoUAssigner',
                iou_calculator=dict(
                    type='BboxOverlaps3D', coordinate='lidar'),
                pos_iou_thr=0.55,
                neg_iou_thr=0.55,
                min_pos_iou=0.55,
                ignore_iof_thr=-1),
            dict(  # for Car
                type='Max3DIoUAssigner',
                iou_calculator=dict(
                    type='BboxOverlaps3D', coordinate='lidar'),
                pos_iou_thr=0.55,
                neg_iou_thr=0.55,
                min_pos_iou=0.55,
                ignore_iof_thr=-1)
        ],
        sampler=dict(
            type='IoUNegPiecewiseSampler',
            num=128,
            pos_fraction=0.5,
            neg_piece_fractions=[0.8, 0.2],
            neg_iou_piece_thrs=[0.55, 0.1],
            neg_pos_ub=-1,
            add_gt_as_proposals=False,
            return_iou=True),
        cls_pos_thr=0.75,
        cls_neg_thr=0.25)),
test_cfg=dict(
    rpn=dict(
        nms_pre=1024,
        nms_post=100,
        max_num=100,
        nms_thr=0.7,
        score_thr=0,
        use_rotate_nms=True),
    rcnn=dict(
        use_rotate_nms=True,
        use_raw_score=True,
        nms_thr=0.1,
        score_thr=0.1)))

train_dataloader = dict( batch_size=2, num_workers=2, dataset=dict(dataset=dict(pipeline=train_pipeline, metainfo=metainfo))) test_dataloader = dict(dataset=dict(pipeline=test_pipeline, metainfo=metainfo)) eval_dataloader = dict(dataset=dict(pipeline=test_pipeline, metainfo=metainfo)) lr = 0.001 optim_wrapper = dict(optimizer=dict(lr=lr)) param_scheduler = [

learning rate scheduler

# During the first 16 epochs, learning rate increases from 0 to lr * 10
# during the next 24 epochs, learning rate decreases from lr * 10 to
# lr * 1e-4
dict(
    type='CosineAnnealingLR',
    T_max=15,
    eta_min=lr * 10,
    begin=0,
    end=15,
    by_epoch=True,
    convert_to_iter_based=True),
dict(
    type='CosineAnnealingLR',
    T_max=25,
    eta_min=lr * 1e-4,
    begin=15,
    end=40,
    by_epoch=True,
    convert_to_iter_based=True),
# momentum scheduler
# During the first 16 epochs, momentum increases from 0 to 0.85 / 0.95
# during the next 24 epochs, momentum increases from 0.85 / 0.95 to 1
dict(
    type='CosineAnnealingMomentum',
    T_max=15,
    eta_min=0.85 / 0.95,
    begin=0,
    end=15,
    by_epoch=True,
    convert_to_iter_based=True),
dict(
    type='CosineAnnealingMomentum',
    T_max=25,
    eta_min=1,
    begin=15,
    end=40,
    by_epoch=True,
    convert_to_iter_based=True)

]

Reproduces the problem - command or script

python tools/train.py configs/pv_rcnn/pv_rcnn_8xb2-80e_waymoD5-3d-3class.py

Reproduces the problem - error message

2024/08/26 16:36:38 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
2024/08/26 16:36:38 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
2024/08/26 16:36:38 - mmengine - INFO - Checkpoints will be saved to /home/p3d/mmdetection3d/work_dirs/pv_rcnn_8xb2-80e_kitti-3d-3class.
2024/08/26 16:37:07 - mmengine - INFO - Epoch(train)  [1][ 50/953]  lr: 1.0003e-03  eta: 6:10:06  time: 0.5833  data_time: 0.0577  memory: 5471  grad_norm: 64.3213  loss: 5.1095  loss_rpn_cls: 1.0152  loss_rpn_bbox: 2.5055  loss_rpn_dir: 0.1402  loss_semantic: 0.3904  loss_cls: 0.2100  loss_bbox: 0.3699  loss_corner: 0.4783
2024/08/26 16:37:32 - mmengine - INFO - Epoch(train)  [1][100/953]  lr: 1.0011e-03  eta: 5:47:37  time: 0.5139  data_time: 0.0025  memory: 5475  grad_norm: 117.8042  loss: 6.0721  loss_rpn_cls: 0.8844  loss_rpn_bbox: 2.0811  loss_rpn_dir: 0.1425  loss_semantic: 0.2730  loss_cls: 0.0548  loss_bbox: 1.1511  loss_corner: 1.4852
2024/08/26 16:37:58 - mmengine - INFO - Epoch(train)  [1][150/953]  lr: 1.0024e-03  eta: 5:38:08  time: 0.5058  data_time: 0.0025  memory: 5475  grad_norm: 85.3810  loss: 5.6782  loss_rpn_cls: 0.8819  loss_rpn_bbox: 1.7748  loss_rpn_dir: 0.1345  loss_semantic: 0.2720  loss_cls: 0.0479  loss_bbox: 1.0660  loss_corner: 1.5011
2024/08/26 16:38:23 - mmengine - INFO - Epoch(train)  [1][200/953]  lr: 1.0043e-03  eta: 5:34:07  time: 0.5117  data_time: 0.0025  memory: 5483  grad_norm: 227.8793  loss: 4.6860  loss_rpn_cls: 0.8696  loss_rpn_bbox: 1.7896  loss_rpn_dir: 0.1349  loss_semantic: 0.2764  loss_cls: 0.0526  loss_bbox: 0.6550  loss_corner: 0.9080
2024/08/26 16:38:49 - mmengine - INFO - Epoch(train)  [1][250/953]  lr: 1.0067e-03  eta: 5:32:01  time: 0.5156  data_time: 0.0025  memory: 5472  grad_norm: 22.9655  loss: 4.0009  loss_rpn_cls: 0.8758  loss_rpn_bbox: 1.8573  loss_rpn_dir: 0.1375  loss_semantic: 0.2765  loss_cls: 0.0396  loss_bbox: 0.3447  loss_corner: 0.4695
2024/08/26 16:39:15 - mmengine - INFO - Epoch(train)  [1][300/953]  lr: 1.0097e-03  eta: 5:30:12  time: 0.5128  data_time: 0.0025  memory: 5485  grad_norm: 10.5042  loss: 3.3381  loss_rpn_cls: 0.8577  loss_rpn_bbox: 1.7389  loss_rpn_dir: 0.1357  loss_semantic: 0.2493  loss_cls: 0.0614  loss_bbox: 0.1394  loss_corner: 0.1556
2024/08/26 16:39:40 - mmengine - INFO - Epoch(train)  [1][350/953]  lr: 1.0132e-03  eta: 5:28:30  time: 0.5098  data_time: 0.0025  memory: 5471  grad_norm: 104.3925  loss: 4.5675  loss_rpn_cls: 0.8565  loss_rpn_bbox: 2.0448  loss_rpn_dir: 0.1451  loss_semantic: 0.2206  loss_cls: 0.0539  loss_bbox: 0.5301  loss_corner: 0.7165
2024/08/26 16:40:06 - mmengine - INFO - Epoch(train)  [1][400/953]  lr: 1.0173e-03  eta: 5:27:23  time: 0.5132  data_time: 0.0025  memory: 5493  grad_norm: 17.8042  loss: 3.8322  loss_rpn_cls: 0.8452  loss_rpn_bbox: 1.8344  loss_rpn_dir: 0.1438  loss_semantic: 0.2314  loss_cls: 0.1056  loss_bbox: 0.3353  loss_corner: 0.3366
2024/08/26 16:40:31 - mmengine - INFO - Epoch(train)  [1][450/953]  lr: 1.0219e-03  eta: 5:26:07  time: 0.5090  data_time: 0.0025  memory: 5461  grad_norm: 19.3698  loss: 3.8768  loss_rpn_cls: 0.8376  loss_rpn_bbox: 1.9742  loss_rpn_dir: 0.1435  loss_semantic: 0.2579  loss_cls: 0.1180  loss_bbox: 0.2715  loss_corner: 0.2741
2024/08/26 16:40:57 - mmengine - INFO - Epoch(train)  [1][500/953]  lr: 1.0270e-03  eta: 5:25:25  time: 0.5151  data_time: 0.0025  memory: 5442  grad_norm: 10.3454  loss: 3.5557  loss_rpn_cls: 0.8173  loss_rpn_bbox: 1.6799  loss_rpn_dir: 0.1292  loss_semantic: 0.2385  loss_cls: 0.1369  loss_bbox: 0.3103  loss_corner: 0.2436
2024/08/26 16:41:23 - mmengine - INFO - Epoch(train)  [1][550/953]  lr: 1.0327e-03  eta: 5:24:40  time: 0.5134  data_time: 0.0025  memory: 5494  grad_norm: 10.6435  loss: 3.4171  loss_rpn_cls: 0.8220  loss_rpn_bbox: 1.7232  loss_rpn_dir: 0.1350  loss_semantic: 0.2404  loss_cls: 0.1270  loss_bbox: 0.2380  loss_corner: 0.1315
2024/08/26 16:41:49 - mmengine - INFO - Epoch(train)  [1][600/953]  lr: 1.0389e-03  eta: 5:24:01  time: 0.5145  data_time: 0.0025  memory: 5493  grad_norm: 11.2470  loss: 3.5431  loss_rpn_cls: 0.8181  loss_rpn_bbox: 1.6875  loss_rpn_dir: 0.1313  loss_semantic: 0.2321  loss_cls: 0.1481  loss_bbox: 0.3157  loss_corner: 0.2103
2024/08/26 16:42:15 - mmengine - INFO - Epoch(train)  [1][650/953]  lr: 1.0457e-03  eta: 5:23:46  time: 0.5220  data_time: 0.0025  memory: 5472  grad_norm: 12.1980  loss: 3.5502  loss_rpn_cls: 0.8014  loss_rpn_bbox: 1.5605  loss_rpn_dir: 0.1317  loss_semantic: 0.2484  loss_cls: 0.1579  loss_bbox: 0.3557  loss_corner: 0.2947
2024/08/26 16:42:40 - mmengine - INFO - Epoch(train)  [1][700/953]  lr: 1.0530e-03  eta: 5:23:13  time: 0.5156  data_time: 0.0025  memory: 5461  grad_norm: 16.2135  loss: 4.0591  loss_rpn_cls: 0.7855  loss_rpn_bbox: 1.5652  loss_rpn_dir: 0.1321  loss_semantic: 0.2060  loss_cls: 0.2168  loss_bbox: 0.7062  loss_corner: 0.4473
2024/08/26 16:43:06 - mmengine - INFO - Epoch(train)  [1][750/953]  lr: 1.0608e-03  eta: 5:22:53  time: 0.5208  data_time: 0.0024  memory: 5462  grad_norm: 6.7702  loss: 3.3603  loss_rpn_cls: 0.7859  loss_rpn_bbox: 1.5113  loss_rpn_dir: 0.1329  loss_semantic: 0.2066  loss_cls: 0.2034  loss_bbox: 0.3778  loss_corner: 0.1424
2024/08/26 16:43:33 - mmengine - INFO - Epoch(train)  [1][800/953]  lr: 1.0692e-03  eta: 5:22:41  time: 0.5244  data_time: 0.0025  memory: 5500  grad_norm: 58.0955  loss: 4.6469  loss_rpn_cls: 0.7640  loss_rpn_bbox: 1.6237  loss_rpn_dir: 0.1338  loss_semantic: 0.2091  loss_cls: 0.2211  loss_bbox: 0.9878  loss_corner: 0.7074
2024/08/26 16:43:59 - mmengine - INFO - Epoch(train)  [1][850/953]  lr: 1.0781e-03  eta: 5:22:26  time: 0.5236  data_time: 0.0025  memory: 5496  grad_norm: 5.8650  loss: 3.6367  loss_rpn_cls: 0.7820  loss_rpn_bbox: 1.5547  loss_rpn_dir: 0.1351  loss_semantic: 0.2049  loss_cls: 0.2154  loss_bbox: 0.5663  loss_corner: 0.1784
2024/08/26 16:44:25 - mmengine - INFO - Epoch(train)  [1][900/953]  lr: 1.0875e-03  eta: 5:21:56  time: 0.5173  data_time: 0.0025  memory: 5479  grad_norm: 11.3599  loss: 3.7419  loss_rpn_cls: 0.7610  loss_rpn_bbox: 1.5552  loss_rpn_dir: 0.1324  loss_semantic: 0.2033  loss_cls: 0.2735  loss_bbox: 0.5545  loss_corner: 0.2621
2024/08/26 16:44:51 - mmengine - INFO - Epoch(train)  [1][950/953]  lr: 1.0975e-03  eta: 5:21:29  time: 0.5183  data_time: 0.0025  memory: 5450  grad_norm: 5.7513  loss: 3.6732  loss_rpn_cls: 0.7806  loss_rpn_bbox: 1.5331  loss_rpn_dir: 0.1329  loss_semantic: 0.1994  loss_cls: 0.2319  loss_bbox: 0.5843  loss_corner: 0.2111
2024/08/26 16:44:52 - mmengine - INFO - Exp name: pv_rcnn_8xb2-80e_kitti-3d-3class_20240826_163621
2024/08/26 16:45:04 - mmengine - INFO - Epoch(val)  [1][  50/4931]    eta: 0:19:30  time: 0.2399  data_time: 0.0021  memory: 5456  
2024/08/26 16:45:16 - mmengine - INFO - Epoch(val)  [1][ 100/4931]    eta: 0:19:18  time: 0.2397  data_time: 0.0010  memory: 993  
2024/08/26 16:45:28 - mmengine - INFO - Epoch(val)  [1][ 150/4931]    eta: 0:19:12  time: 0.2433  data_time: 0.0010  memory: 988  
2024/08/26 16:45:40 - mmengine - INFO - Epoch(val)  [1][ 200/4931]    eta: 0:18:59  time: 0.2409  data_time: 0.0010  memory: 984  
2024/08/26 16:45:54 - mmengine - INFO - Epoch(val)  [1][ 250/4931]    eta: 0:19:12  time: 0.2673  data_time: 0.0011  memory: 952  
2024/08/26 16:46:07 - mmengine - INFO - Epoch(val)  [1][ 300/4931]    eta: 0:19:14  time: 0.2650  data_time: 0.0011  memory: 954  
2024/08/26 16:46:20 - mmengine - INFO - Epoch(val)  [1][ 350/4931]    eta: 0:19:08  time: 0.2583  data_time: 0.0011  memory: 962  
2024/08/26 16:46:33 - mmengine - INFO - Epoch(val)  [1][ 400/4931]    eta: 0:18:58  time: 0.2550  data_time: 0.0011  memory: 956  
2024/08/26 16:46:46 - mmengine - INFO - Epoch(val)  [1][ 450/4931]    eta: 0:18:55  time: 0.2706  data_time: 0.0011  memory: 924  
2024/08/26 16:47:00 - mmengine - INFO - Epoch(val)  [1][ 500/4931]    eta: 0:18:53  time: 0.2786  data_time: 0.0011  memory: 941  
2024/08/26 16:47:14 - mmengine - INFO - Epoch(val)  [1][ 550/4931]    eta: 0:18:48  time: 0.2754  data_time: 0.0011  memory: 928  
2024/08/26 16:47:27 - mmengine - INFO - Epoch(val)  [1][ 600/4931]    eta: 0:18:38  time: 0.2654  data_time: 0.0011  memory: 947  
2024/08/26 16:47:41 - mmengine - INFO - Epoch(val)  [1][ 650/4931]    eta: 0:18:30  time: 0.2727  data_time: 0.0011  memory: 946  
2024/08/26 16:47:54 - mmengine - INFO - Epoch(val)  [1][ 700/4931]    eta: 0:18:21  time: 0.2741  data_time: 0.0011  memory: 961  
2024/08/26 16:48:08 - mmengine - INFO - Epoch(val)  [1][ 750/4931]    eta: 0:18:11  time: 0.2689  data_time: 0.0011  memory: 980  
2024/08/26 16:48:21 - mmengine - INFO - Epoch(val)  [1][ 800/4931]    eta: 0:17:58  time: 0.2609  data_time: 0.0011  memory: 981  
2024/08/26 16:48:36 - mmengine - INFO - Epoch(val)  [1][ 850/4931]    eta: 0:17:53  time: 0.2971  data_time: 0.0011  memory: 988  
2024/08/26 16:48:50 - mmengine - INFO - Epoch(val)  [1][ 900/4931]    eta: 0:17:45  time: 0.2863  data_time: 0.0011  memory: 985  
2024/08/26 16:49:05 - mmengine - INFO - Epoch(val)  [1][ 950/4931]    eta: 0:17:37  time: 0.2898  data_time: 0.0011  memory: 983  
2024/08/26 16:49:18 - mmengine - INFO - Epoch(val)  [1][1000/4931]    eta: 0:17:26  time: 0.2762  data_time: 0.0011  memory: 963  
2024/08/26 16:49:32 - mmengine - INFO - Epoch(val)  [1][1050/4931]    eta: 0:17:13  time: 0.2668  data_time: 0.0011  memory: 935  
2024/08/26 16:49:45 - mmengine - INFO - Epoch(val)  [1][1100/4931]    eta: 0:16:59  time: 0.2640  data_time: 0.0011  memory: 937  
2024/08/26 16:49:58 - mmengine - INFO - Epoch(val)  [1][1150/4931]    eta: 0:16:44  time: 0.2531  data_time: 0.0011  memory: 927  
2024/08/26 16:50:10 - mmengine - INFO - Epoch(val)  [1][1200/4931]    eta: 0:16:27  time: 0.2461  data_time: 0.0011  memory: 930  
2024/08/26 16:50:21 - mmengine - INFO - Epoch(val)  [1][1250/4931]    eta: 0:16:08  time: 0.2196  data_time: 0.0011  memory: 910  
2024/08/26 16:50:32 - mmengine - INFO - Epoch(val)  [1][1300/4931]    eta: 0:15:49  time: 0.2236  data_time: 0.0011  memory: 911  
2024/08/26 16:50:43 - mmengine - INFO - Epoch(val)  [1][1350/4931]    eta: 0:15:31  time: 0.2224  data_time: 0.0011  memory: 913  
2024/08/26 16:50:57 - mmengine - INFO - Epoch(val)  [1][1400/4931]    eta: 0:15:18  time: 0.2662  data_time: 0.0011  memory: 966  
2024/08/26 16:51:11 - mmengine - INFO - Epoch(val)  [1][1450/4931]    eta: 0:15:09  time: 0.2937  data_time: 0.0012  memory: 968  
2024/08/26 16:51:26 - mmengine - INFO - Epoch(val)  [1][1500/4931]    eta: 0:15:00  time: 0.2920  data_time: 0.0012  memory: 958  
2024/08/26 16:51:40 - mmengine - INFO - Epoch(val)  [1][1550/4931]    eta: 0:14:50  time: 0.2928  data_time: 0.0012  memory: 959  
2024/08/26 16:51:55 - mmengine - INFO - Epoch(val)  [1][1600/4931]    eta: 0:14:39  time: 0.2869  data_time: 0.0011  memory: 972  
2024/08/26 16:52:09 - mmengine - INFO - Epoch(val)  [1][1650/4931]    eta: 0:14:28  time: 0.2845  data_time: 0.0012  memory: 963  
2024/08/26 16:52:23 - mmengine - INFO - Epoch(val)  [1][1700/4931]    eta: 0:14:17  time: 0.2830  data_time: 0.0012  memory: 963  
2024/08/26 16:52:37 - mmengine - INFO - Epoch(val)  [1][1750/4931]    eta: 0:14:05  time: 0.2812  data_time: 0.0012  memory: 963  
2024/08/26 16:52:51 - mmengine - INFO - Epoch(val)  [1][1800/4931]    eta: 0:13:53  time: 0.2778  data_time: 0.0012  memory: 978  
2024/08/26 16:53:05 - mmengine - INFO - Epoch(val)  [1][1850/4931]    eta: 0:13:40  time: 0.2738  data_time: 0.0012  memory: 983  
2024/08/26 16:53:18 - mmengine - INFO - Epoch(val)  [1][1900/4931]    eta: 0:13:26  time: 0.2577  data_time: 0.0011  memory: 972  
2024/08/26 16:53:31 - mmengine - INFO - Epoch(val)  [1][1950/4931]    eta: 0:13:12  time: 0.2569  data_time: 0.0011  memory: 981  
2024/08/26 16:53:44 - mmengine - INFO - Epoch(val)  [1][2000/4931]    eta: 0:12:59  time: 0.2666  data_time: 0.0012  memory: 974  
2024/08/26 16:53:57 - mmengine - INFO - Epoch(val)  [1][2050/4931]    eta: 0:12:45  time: 0.2662  data_time: 0.0012  memory: 940  
2024/08/26 16:54:10 - mmengine - INFO - Epoch(val)  [1][2100/4931]    eta: 0:12:32  time: 0.2628  data_time: 0.0012  memory: 940  
2024/08/26 16:54:23 - mmengine - INFO - Epoch(val)  [1][2150/4931]    eta: 0:12:18  time: 0.2615  data_time: 0.0011  memory: 937  
2024/08/26 16:54:38 - mmengine - INFO - Epoch(val)  [1][2200/4931]    eta: 0:12:06  time: 0.2864  data_time: 0.0012  memory: 949  
2024/08/26 16:54:52 - mmengine - INFO - Epoch(val)  [1][2250/4931]    eta: 0:11:54  time: 0.2901  data_time: 0.0012  memory: 948  
2024/08/26 16:55:07 - mmengine - INFO - Epoch(val)  [1][2300/4931]    eta: 0:11:42  time: 0.2879  data_time: 0.0012  memory: 954  
2024/08/26 16:55:21 - mmengine - INFO - Epoch(val)  [1][2350/4931]    eta: 0:11:30  time: 0.2890  data_time: 0.0012  memory: 953  
2024/08/26 16:55:34 - mmengine - INFO - Epoch(val)  [1][2400/4931]    eta: 0:11:17  time: 0.2661  data_time: 0.0011  memory: 967  
2024/08/26 16:55:48 - mmengine - INFO - Epoch(val)  [1][2450/4931]    eta: 0:11:03  time: 0.2654  data_time: 0.0011  memory: 977  
2024/08/26 16:56:01 - mmengine - INFO - Epoch(val)  [1][2500/4931]    eta: 0:10:50  time: 0.2616  data_time: 0.0011  memory: 971  
2024/08/26 16:56:14 - mmengine - INFO - Epoch(val)  [1][2550/4931]    eta: 0:10:36  time: 0.2564  data_time: 0.0011  memory: 977  
2024/08/26 16:56:26 - mmengine - INFO - Epoch(val)  [1][2600/4931]    eta: 0:10:22  time: 0.2542  data_time: 0.0011  memory: 970  
2024/08/26 16:56:39 - mmengine - INFO - Epoch(val)  [1][2650/4931]    eta: 0:10:08  time: 0.2515  data_time: 0.0011  memory: 940  
2024/08/26 16:56:52 - mmengine - INFO - Epoch(val)  [1][2700/4931]    eta: 0:09:54  time: 0.2532  data_time: 0.0011  memory: 945  
2024/08/26 16:57:05 - mmengine - INFO - Epoch(val)  [1][2750/4931]    eta: 0:09:40  time: 0.2591  data_time: 0.0011  memory: 946  
2024/08/26 16:57:16 - mmengine - INFO - Epoch(val)  [1][2800/4931]    eta: 0:09:26  time: 0.2365  data_time: 0.0011  memory: 965  
2024/08/26 16:57:28 - mmengine - INFO - Epoch(val)  [1][2850/4931]    eta: 0:09:11  time: 0.2324  data_time: 0.0011  memory: 964  
2024/08/26 16:57:40 - mmengine - INFO - Epoch(val)  [1][2900/4931]    eta: 0:08:57  time: 0.2340  data_time: 0.0010  memory: 972  
2024/08/26 16:57:51 - mmengine - INFO - Epoch(val)  [1][2950/4931]    eta: 0:08:43  time: 0.2337  data_time: 0.0010  memory: 973  
2024/08/26 16:58:05 - mmengine - INFO - Epoch(val)  [1][3000/4931]    eta: 0:08:30  time: 0.2830  data_time: 0.0011  memory: 955  
2024/08/26 16:58:19 - mmengine - INFO - Epoch(val)  [1][3050/4931]    eta: 0:08:17  time: 0.2783  data_time: 0.0011  memory: 957  
2024/08/26 16:58:33 - mmengine - INFO - Epoch(val)  [1][3100/4931]    eta: 0:08:04  time: 0.2764  data_time: 0.0011  memory: 958  
2024/08/26 16:58:47 - mmengine - INFO - Epoch(val)  [1][3150/4931]    eta: 0:07:51  time: 0.2748  data_time: 0.0011  memory: 958  
2024/08/26 16:59:02 - mmengine - INFO - Epoch(val)  [1][3200/4931]    eta: 0:07:39  time: 0.2967  data_time: 0.0012  memory: 958  
2024/08/26 16:59:17 - mmengine - INFO - Epoch(val)  [1][3250/4931]    eta: 0:07:27  time: 0.2996  data_time: 0.0012  memory: 962  
2024/08/26 16:59:32 - mmengine - INFO - Epoch(val)  [1][3300/4931]    eta: 0:07:14  time: 0.2980  data_time: 0.0011  memory: 962  
2024/08/26 16:59:46 - mmengine - INFO - Epoch(val)  [1][3350/4931]    eta: 0:07:02  time: 0.2956  data_time: 0.0011  memory: 995  
2024/08/26 17:00:00 - mmengine - INFO - Epoch(val)  [1][3400/4931]    eta: 0:06:48  time: 0.2656  data_time: 0.0011  memory: 1005  
2024/08/26 17:00:12 - mmengine - INFO - Epoch(val)  [1][3450/4931]    eta: 0:06:34  time: 0.2498  data_time: 0.0011  memory: 995  
2024/08/26 17:00:24 - mmengine - INFO - Epoch(val)  [1][3500/4931]    eta: 0:06:20  time: 0.2378  data_time: 0.0010  memory: 988  
2024/08/26 17:00:36 - mmengine - INFO - Epoch(val)  [1][3550/4931]    eta: 0:06:07  time: 0.2445  data_time: 0.0011  memory: 990  
2024/08/26 17:00:50 - mmengine - INFO - Epoch(val)  [1][3600/4931]    eta: 0:05:54  time: 0.2686  data_time: 0.0011  memory: 979  
2024/08/26 17:01:03 - mmengine - INFO - Epoch(val)  [1][3650/4931]    eta: 0:05:40  time: 0.2631  data_time: 0.0011  memory: 974  
2024/08/26 17:01:16 - mmengine - INFO - Epoch(val)  [1][3700/4931]    eta: 0:05:27  time: 0.2572  data_time: 0.0011  memory: 964  
2024/08/26 17:01:28 - mmengine - INFO - Epoch(val)  [1][3750/4931]    eta: 0:05:13  time: 0.2532  data_time: 0.0011  memory: 970  
2024/08/26 17:01:41 - mmengine - INFO - Epoch(val)  [1][3800/4931]    eta: 0:05:00  time: 0.2600  data_time: 0.0011  memory: 957  
2024/08/26 17:01:55 - mmengine - INFO - Epoch(val)  [1][3850/4931]    eta: 0:04:47  time: 0.2663  data_time: 0.0011  memory: 962  
2024/08/26 17:02:09 - mmengine - INFO - Epoch(val)  [1][3900/4931]    eta: 0:04:34  time: 0.2820  data_time: 0.0011  memory: 975  
2024/08/26 17:02:23 - mmengine - INFO - Epoch(val)  [1][3950/4931]    eta: 0:04:20  time: 0.2791  data_time: 0.0011  memory: 972  
2024/08/26 17:02:37 - mmengine - INFO - Epoch(val)  [1][4000/4931]    eta: 0:04:07  time: 0.2821  data_time: 0.0011  memory: 962  
2024/08/26 17:02:51 - mmengine - INFO - Epoch(val)  [1][4050/4931]    eta: 0:03:54  time: 0.2746  data_time: 0.0011  memory: 941  
2024/08/26 17:03:05 - mmengine - INFO - Epoch(val)  [1][4100/4931]    eta: 0:03:41  time: 0.2758  data_time: 0.0011  memory: 942  
2024/08/26 17:03:18 - mmengine - INFO - Epoch(val)  [1][4150/4931]    eta: 0:03:28  time: 0.2770  data_time: 0.0011  memory: 947  
2024/08/26 17:03:31 - mmengine - INFO - Epoch(val)  [1][4200/4931]    eta: 0:03:14  time: 0.2572  data_time: 0.0011  memory: 951  
2024/08/26 17:03:44 - mmengine - INFO - Epoch(val)  [1][4250/4931]    eta: 0:03:01  time: 0.2477  data_time: 0.0011  memory: 933  
2024/08/26 17:03:56 - mmengine - INFO - Epoch(val)  [1][4300/4931]    eta: 0:02:47  time: 0.2508  data_time: 0.0011  memory: 954  
2024/08/26 17:04:09 - mmengine - INFO - Epoch(val)  [1][4350/4931]    eta: 0:02:34  time: 0.2592  data_time: 0.0011  memory: 948  
2024/08/26 17:04:23 - mmengine - INFO - Epoch(val)  [1][4400/4931]    eta: 0:02:21  time: 0.2776  data_time: 0.0011  memory: 948  
2024/08/26 17:04:37 - mmengine - INFO - Epoch(val)  [1][4450/4931]    eta: 0:02:08  time: 0.2832  data_time: 0.0011  memory: 948  
2024/08/26 17:04:52 - mmengine - INFO - Epoch(val)  [1][4500/4931]    eta: 0:01:54  time: 0.2898  data_time: 0.0011  memory: 951  
2024/08/26 17:05:06 - mmengine - INFO - Epoch(val)  [1][4550/4931]    eta: 0:01:41  time: 0.2819  data_time: 0.0011  memory: 952  
2024/08/26 17:05:20 - mmengine - INFO - Epoch(val)  [1][4600/4931]    eta: 0:01:28  time: 0.2760  data_time: 0.0011  memory: 948  
2024/08/26 17:05:33 - mmengine - INFO - Epoch(val)  [1][4650/4931]    eta: 0:01:14  time: 0.2759  data_time: 0.0011  memory: 947  
2024/08/26 17:05:47 - mmengine - INFO - Epoch(val)  [1][4700/4931]    eta: 0:01:01  time: 0.2746  data_time: 0.0011  memory: 948  
2024/08/26 17:06:01 - mmengine - INFO - Epoch(val)  [1][4750/4931]    eta: 0:00:48  time: 0.2731  data_time: 0.0012  memory: 947  
2024/08/26 17:06:14 - mmengine - INFO - Epoch(val)  [1][4800/4931]    eta: 0:00:34  time: 0.2678  data_time: 0.0011  memory: 918  
2024/08/26 17:06:27 - mmengine - INFO - Epoch(val)  [1][4850/4931]    eta: 0:00:21  time: 0.2615  data_time: 0.0011  memory: 915  
2024/08/26 17:06:40 - mmengine - INFO - Epoch(val)  [1][4900/4931]    eta: 0:00:08  time: 0.2574  data_time: 0.0011  memory: 942  
2024/08/26 17:06:48 - mmengine - INFO - Start converting ...
[...]
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ] 4678/4931, 30691.8 task/s, elapsed: 0s, ETA:     0s4678 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ] 4693/4931, 30655.1 task/s, elapsed: 0s, ETA:     0s4693 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ] 4717/4931, 30595.1 task/s, elapsed: 0s, ETA:     0s4717 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ] 4730/4931, 30575.3 task/s, elapsed: 0s, ETA:     0s4730 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4734/4931, 30578.8 task/s, elapsed: 0s, ETA:     0s4734 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4735/4931, 30581.8 task/s, elapsed: 0s, ETA:     0s4735 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4737/4931, 30586.4 task/s, elapsed: 0s, ETA:     0s4737 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4738/4931, 30589.7 task/s, elapsed: 0s, ETA:     0s4738 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4740/4931, 30594.6 task/s, elapsed: 0s, ETA:     0s4740 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4743/4931, 30596.2 task/s, elapsed: 0s, ETA:     0s4743 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4744/4931, 30599.3 task/s, elapsed: 0s, ETA:     0s4744 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4774/4931, 30497.8 task/s, elapsed: 0s, ETA:     0s4774 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4804/4931, 30406.6 task/s, elapsed: 0s, ETA:     0s4804 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4812/4931, 30390.7 task/s, elapsed: 0s, ETA:     0s4812 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  ] 4817/4931, 30383.2 task/s, elapsed: 0s, ETA:     0s4817 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4855/4931, 30280.9 task/s, elapsed: 0s, ETA:     0s4855 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4856/4931, 30283.9 task/s, elapsed: 0s, ETA:     0s4856 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4857/4931, 30287.8 task/s, elapsed: 0s, ETA:     0s4857 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4860/4931, 30292.0 task/s, elapsed: 0s, ETA:     0s4860 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4877/4931, 30265.7 task/s, elapsed: 0s, ETA:     0s4877 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4920/4931, 30139.9 task/s, elapsed: 0s, ETA:     0s4920 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 4921/4931, 30142.8 task/s, elapsed: 0s, ETA:     0s4921 not found.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4931/4931, 30133.9 task/s, elapsed: 0s, ETA:     0s
mmdet3d/evaluation/functional/waymo_utils/compute_detection_metrics_main /tmp/tmpy5pttt62/results.bin ./data/waymo/waymo_format/gt.bin
/bin/sh: mmdet3d/evaluation/functional/waymo_utils/compute_detection_metrics_main: No existe el archivo o el directorio
Traceback (most recent call last):
  File "tools/train.py", line 145, in <module>
    main()
  File "tools/train.py", line 141, in main
    runner.train()
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/mmengine/runner/loops.py", line 103, in run
    self.runner.val_loop.run()
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/mmengine/runner/loops.py", line 376, in run
    metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/mmengine/evaluator/evaluator.py", line 79, in evaluate
    _results = metric.evaluate(size)
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/mmengine/evaluator/metric.py", line 133, in evaluate
    _metrics = self.compute_metrics(results)  # type: ignore
  File "/home/p3d/mmdetection3d/mmdet3d/evaluation/metrics/waymo_metric.py", line 168, in compute_metrics
    ap_dict = self.waymo_evaluate(
  File "/home/p3d/mmdetection3d/mmdet3d/evaluation/metrics/waymo_metric.py", line 200, in waymo_evaluate
    ret_bytes = subprocess.check_output(eval_str, shell=True)
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/p3d/miniconda3/envs/openmmlab2/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'mmdet3d/evaluation/functional/waymo_utils/compute_detection_metrics_main /tmp/tmpy5pttt62/results.bin ./data/waymo/waymo_format/gt.bin' returned non-zero exit status `127.

Additional information

I am trying to train the PV-RCNN model with the Waymo database. I have processed the waymo database, as it says in the documentation getting the following folders: kitti_format and waymo_format. The config file has been modified to use the waymo base file and the input channels. The training starts correctly and completes the first epoch without problems. However, when the program tries to perform the evaluation at the end of the first epoch, it fails and throws an error.

Is the error due to the conversion of the data used, or is it directly caused by the model configuration, which does not accept this database?