open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.35k stars 1.55k forks source link

[Bug] Kitti 3D Class is not training on PointPillars #2087

Open BalazsSzekeres opened 1 year ago

BalazsSzekeres commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: win32 Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:00) [MSC v.1929 64 bit (AMD64)] CUDA available: True GPU 0: NVIDIA GeForce RTX 3080 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7 NVCC: Cuda compilation tools, release 11.7, V11.7.64 MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31332 for x64 GCC: n/a PyTorch: 1.13.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.14.0 OpenCV: 4.5.5 MMCV: 1.6.2 MMCV Compiler: MSVC 193231332 MMCV CUDA Compiler: 11.6 MMDetection: 2.25.3 MMSegmentation: 0.29.1 MMDetection3D: 1.0.0rc5+fcb4545 spconv2.0: False

Reproduces the problem - code sample

 python .\tools\train.py .\configs\pointpillars\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py --gpu-id=0

Reproduces the problem - command or script

 python .\tools\train.py .\configs\pointpillars\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py --gpu-id=0

Reproduces the problem - error message

C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmdet\utils\setup_env.py:38: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmdet\utils\setup_env.py:48: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
2022-12-02 13:41:17,189 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: win32
Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:00) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7
NVCC: Cuda compilation tools, release 11.7, V11.7.64
MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31332 for x64
GCC: n/a
PyTorch: 1.13.0
PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL
 -openmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1
, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.14.0
OpenCV: 4.5.5
MMCV: 1.6.2
MMCV Compiler: MSVC 193231332
MMCV CUDA Compiler: 11.6
MMDetection: 2.25.3
MMSegmentation: 0.29.1
MMDetection3D: 1.0.0rc5+fcb4545
spconv2.0: False
------------------------------------------------------------

2022-12-02 13:41:17,974 - mmdet - INFO - Distributed training: False
2022-12-02 13:41:18,764 - mmdet - INFO - Config:
voxel_size = [0.16, 0.16, 4]
model = dict(
    type='VoxelNet',
    voxel_layer=dict(
        max_num_points=32,
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
        voxel_size=[0.16, 0.16, 4],
        max_voxels=(16000, 40000)),
    voxel_encoder=dict(
        type='PillarFeatureNet',
        in_channels=4,
        feat_channels=[64],
        with_distance=False,
        voxel_size=[0.16, 0.16, 4],
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
    middle_encoder=dict(
        type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
    backbone=dict(
        type='SECOND',
        in_channels=64,
        layer_nums=[3, 5, 5],
        layer_strides=[2, 2, 2],
        out_channels=[64, 128, 256]),
    neck=dict(
        type='SECONDFPN',
        in_channels=[64, 128, 256],
        upsample_strides=[1, 2, 4],
        out_channels=[128, 128, 128]),
    bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=384,
        feat_channels=384,
        use_direction_classifier=True,
        assign_per_class=True,
        anchor_generator=dict(
            type='AlignedAnchor3DRangeGenerator',
            ranges=[[0, -39.68, -0.6, 69.12, 39.68, -0.6],
                    [0, -39.68, -0.6, 69.12, 39.68, -0.6],
                    [0, -39.68, -1.78, 69.12, 39.68, -1.78]],
            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
            rotations=[0, 1.57],
            reshape_out=False),
        diff_rad_by_sin=True,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=2.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    train_cfg=dict(
        assigner=[
            dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.5,
                neg_iou_thr=0.35,
                min_pos_iou=0.35,
                ignore_iof_thr=-1),
            dict(
                type='MaxIoUAssigner',
                iou_calculator=dict(type='BboxOverlapsNearest3D'),
                pos_iou_thr=0.6,
                neg_iou_thr=0.45,
                min_pos_iou=0.45,
                ignore_iof_thr=-1)
        ],
        allowed_border=0,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_thr=0.01,
        score_thr=0.1,
        min_bbox_size=0,
        nms_pre=100,
        max_num=50))
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
input_modality = dict(use_lidar=True, use_camera=False)
file_client_args = dict(backend='disk')
db_sampler = dict(
    data_root='data/kitti/',
    info_path='data/kitti/kitti_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(
        filter_by_difficulty=[-1],
        filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
    classes=['Pedestrian', 'Cyclist', 'Car'],
    sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
    points_loader=dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=dict(backend='disk')),
    file_client_args=dict(backend='disk'))
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=dict(backend='disk')),
    dict(
        type='LoadAnnotations3D',
        with_bbox_3d=True,
        with_label_3d=True,
        file_client_args=dict(backend='disk')),
    dict(
        type='ObjectSample',
        db_sampler=dict(
            data_root='data/kitti/',
            info_path='data/kitti/kitti_dbinfos_train.pkl',
            rate=1.0,
            prepare=dict(
                filter_by_difficulty=[-1],
                filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
            classes=['Pedestrian', 'Cyclist', 'Car'],
            sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
            points_loader=dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=4,
                use_dim=4,
                file_client_args=dict(backend='disk')),
            file_client_args=dict(backend='disk')),
        use_ground_plane=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(
        type='PointsRangeFilter',
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
    dict(
        type='ObjectRangeFilter',
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
    dict(type='PointShuffle'),
    dict(
        type='DefaultFormatBundle3D',
        class_names=['Pedestrian', 'Cyclist', 'Car']),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=dict(backend='disk')),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1.0, 1.0],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter',
                point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
            dict(
                type='DefaultFormatBundle3D',
                class_names=['Pedestrian', 'Cyclist', 'Car'],
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=dict(backend='disk')),
    dict(
        type='DefaultFormatBundle3D',
        class_names=['Pedestrian', 'Cyclist', 'Car'],
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]
data = dict(
    samples_per_gpu=6,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type='KittiDataset',
            data_root='data/kitti/',
            ann_file='data/kitti/kitti_infos_train.pkl',
            split='training',
            pts_prefix='velodyne_reduced',
            pipeline=[
                dict(
                    type='LoadPointsFromFile',
                    coord_type='LIDAR',
                    load_dim=4,
                    use_dim=4,
                    file_client_args=dict(backend='disk')),
                dict(
                    type='LoadAnnotations3D',
                    with_bbox_3d=True,
                    with_label_3d=True,
                    file_client_args=dict(backend='disk')),
                dict(
                    type='ObjectSample',
                    db_sampler=dict(
                        data_root='data/kitti/',
                        info_path='data/kitti/kitti_dbinfos_train.pkl',
                        rate=1.0,
                        prepare=dict(
                            filter_by_difficulty=[-1],
                            filter_by_min_points=dict(
                                Car=5, Pedestrian=5, Cyclist=5)),
                        classes=['Pedestrian', 'Cyclist', 'Car'],
                        sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
                        points_loader=dict(
                            type='LoadPointsFromFile',
                            coord_type='LIDAR',
                            load_dim=4,
                            use_dim=4,
                            file_client_args=dict(backend='disk')),
                        file_client_args=dict(backend='disk')),
                    use_ground_plane=True),
                dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
                dict(
                    type='GlobalRotScaleTrans',
                    rot_range=[-0.78539816, 0.78539816],
                    scale_ratio_range=[0.95, 1.05]),
                dict(
                    type='PointsRangeFilter',
                    point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
                dict(
                    type='ObjectRangeFilter',
                    point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
                dict(type='PointShuffle'),
                dict(
                    type='DefaultFormatBundle3D',
                    class_names=['Pedestrian', 'Cyclist', 'Car']),
                dict(
                    type='Collect3D',
                    keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
            ],
            modality=dict(use_lidar=True, use_camera=False),
            classes=['Pedestrian', 'Cyclist', 'Car'],
            test_mode=False,
            box_type_3d='LiDAR',
            file_client_args=dict(backend='disk'))),
    val=dict(
        type='KittiDataset',
        data_root='data/kitti/',
        ann_file='data/kitti/kitti_infos_val.pkl',
        split='training',
        pts_prefix='velodyne_reduced',
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=4,
                use_dim=4,
                file_client_args=dict(backend='disk')),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1333, 800),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
                    dict(
                        type='DefaultFormatBundle3D',
                        class_names=['Pedestrian', 'Cyclist', 'Car'],
                        with_label=False),
                    dict(type='Collect3D', keys=['points'])
                ])
        ],
        modality=dict(use_lidar=True, use_camera=False),
        classes=['Pedestrian', 'Cyclist', 'Car'],
        test_mode=True,
        box_type_3d='LiDAR',
        file_client_args=dict(backend='disk')),
    test=dict(
        type='KittiDataset',
        data_root='data/kitti/',
        ann_file='data/kitti/kitti_infos_val.pkl',
        split='training',
        pts_prefix='velodyne_reduced',
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=4,
                use_dim=4,
                file_client_args=dict(backend='disk')),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1333, 800),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
                    dict(
                        type='DefaultFormatBundle3D',
                        class_names=['Pedestrian', 'Cyclist', 'Car'],
                        with_label=False),
                    dict(type='Collect3D', keys=['points'])
                ])
        ],
        modality=dict(use_lidar=True, use_camera=False),
        classes=['Pedestrian', 'Cyclist', 'Car'],
        test_mode=True,
        box_type_3d='LiDAR',
        file_client_args=dict(backend='disk')))
evaluation = dict(
    interval=2,
    pipeline=[
        dict(
            type='LoadPointsFromFile',
            coord_type='LIDAR',
            load_dim=4,
            use_dim=4,
            file_client_args=dict(backend='disk')),
        dict(
            type='DefaultFormatBundle3D',
            class_names=['Pedestrian', 'Cyclist', 'Car'],
            with_label=False),
        dict(type='Collect3D', keys=['points'])
    ])
lr = 0.001
optimizer = dict(type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 0.0001),
    cyclic_times=1,
    step_ratio_up=0.4)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.8947368421052632, 1),
    cyclic_times=1,
    step_ratio_up=0.4)
runner = dict(type='EpochBasedRunner', max_epochs=80)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
gpu_ids = [0]

2022-12-02 13:41:18,773 - mmdet - INFO - Set random seed to 0, deterministic: False
e:\repos\tudelft\mmdetection3d\mmdet3d\models\dense_heads\anchor3d_head.py:84: UserWarning: dir_offset and dir_limit_offset will be depressed and be incorporated into box coder in the future
  warnings.warn(
2022-12-02 13:41:18,805 - mmdet - INFO - initialize SECOND with init_cfg {'type': 'Kaiming', 'layer': 'Conv2d'}
2022-12-02 13:41:18,829 - mmdet - INFO - initialize SECONDFPN with init_cfg [{'type': 'Kaiming', 'layer': 'ConvTranspose2d'}, {'type': 'Constant', 'layer': 'NaiveSyncBatchNorm2d', 'val': 1.0}]
2022-12-02 13:41:18,833 - mmdet - INFO - initialize Anchor3DHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'conv_cls', 'std': 0.01, 'bias_prob': 0.01}}
2022-12-02 13:41:18,835 - mmdet - INFO - Model:
VoxelNet(
  (backbone): SECOND(
    (blocks): ModuleList(
      (0): Sequential(
        (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
        (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (13): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (14): ReLU(inplace=True)
        (15): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (16): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (17): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (7): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (10): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (11): ReLU(inplace=True)
        (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (13): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (14): ReLU(inplace=True)
        (15): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (16): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (17): ReLU(inplace=True)
      )
    )
  )
  init_cfg={'type': 'Kaiming', 'layer': 'Conv2d'}
  (neck): SECONDFPN(
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
  )
  init_cfg=[{'type': 'Kaiming', 'layer': 'ConvTranspose2d'}, {'type': 'Constant', 'layer': 'NaiveSyncBatchNorm2d', 'val': 1.0}]
  (bbox_head): Anchor3DHead(
    (loss_cls): FocalLoss()
    (loss_bbox): SmoothL1Loss()
    (loss_dir): CrossEntropyLoss(avg_non_ignore=False)
    (conv_cls): Conv2d(384, 18, kernel_size=(1, 1), stride=(1, 1))
    (conv_reg): Conv2d(384, 42, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(384, 12, kernel_size=(1, 1), stride=(1, 1))
  )
  init_cfg={'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'conv_cls', 'std': 0.01, 'bias_prob': 0.01}}
  (voxel_layer): Voxelization(voxel_size=[0.16, 0.16, 4], point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1], max_num_points=32, max_voxels=(16000, 40000), deterministic=True)
  (voxel_encoder): PillarFeatureNet(
    (pfn_layers): ModuleList(
      (0): PFNLayer(
        (norm): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (linear): Linear(in_features=10, out_features=64, bias=False)
      )
    )
  )
  (middle_encoder): PointPillarsScatter()
)
2022-12-02 13:41:20,134 - mmdet - INFO - Start running, host: balaz@DESKTOP-5EDBPBO, work_dir: E:\repos\TUDelft\mmdetection3d\work_dirs\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class
2022-12-02 13:41:20,135 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) CyclicLrUpdaterHook
(HIGH        ) CyclicMomentumUpdaterHook
(NORMAL      ) CheckpointHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) CyclicLrUpdaterHook
(HIGH        ) CyclicMomentumUpdaterHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_iter:
(VERY_HIGH   ) CyclicLrUpdaterHook
(HIGH        ) CyclicMomentumUpdaterHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
 --------------------
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL      ) CheckpointHook
(LOW         ) IterTimerHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_train_epoch:
(NORMAL      ) CheckpointHook
(LOW         ) EvalHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_epoch:
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_epoch:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_run:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
2022-12-02 13:41:20,136 - mmdet - INFO - workflow: [('train', 1)], max: 80 epochs
2022-12-02 13:41:20,136 - mmdet - INFO - Checkpoints will be saved to E:\repos\TUDelft\mmdetection3d\work_dirs\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class by HardDiskBackend.
C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File ".\tools\train.py", line 263, in <module>
    main()
  File ".\tools\train.py", line 252, in main
    train_model(
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\apis\train.py", line 344, in train_model
    train_detector(
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\apis\train.py", line 319, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 136, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 53, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 31, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\parallel\data_parallel.py", line 77, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmdet\models\detectors\base.py", line 248, in train_step
    losses = self(**data)
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\runner\fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\models\detectors\base.py", line 60, in forward
    return self.forward_train(**kwargs)
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\models\detectors\voxelnet.py", line 94, in forward_train
    losses = self.bbox_head.loss(
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmcv\runner\fp16_utils.py", line 205, in new_func
    return old_func(*args, **kwargs)
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\models\dense_heads\anchor3d_head.py", line 342, in loss
    cls_reg_targets = self.anchor_target_3d(
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\models\dense_heads\train_mixins.py", line 71, in anchor_target_3d
    neg_inds_list) = multi_apply(
  File "C:\Users\balaz\anaconda3\envs\mmlab\lib\site-packages\mmdet\core\utils\misc.py", line 30, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\models\dense_heads\train_mixins.py", line 142, in anchor_target_3d_single
    assigner, current_anchors, gt_bboxes[gt_per_cls, :],
  File "e:\repos\tudelft\mmdetection3d\mmdet3d\core\bbox\structures\base_box3d.py", line 327, in __getitem__
    b = self.tensor[item]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Additional information

Interestingly, when running Kitti 3D Car PointPillars, the training works as expected. I have checked the differences between the two config files, and the only difference is the model, so the issue should be somewhere there.

srimannarayanabaratam commented 1 year ago

I face the exact same issue and can confirm that training works with single-class configs (car).

My environment setup as collected from mmdet3d/utils/collect_env.py:

sys.platform: linux
Python: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB
CUDA_HOME: /scratch/sbaratam/envs/openmmlab
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.13.0
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.14.0
OpenCV: 4.6.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.6
MMDetection: 2.26.0
MMSegmentation: 0.29.1
MMDetection3D: 1.0.0rc5+7b493dc
spconv2.0: False
BalazsSzekeres commented 1 year ago

What I noticed is that both me and @TarunKumar1995-glitch use PyTorch version 1.13. I have downgraded to 1.9.1 with conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge, which solves the issue. It seems that latest PyTorch version is incompatible with the current package. What is the latest PyTorch version that is compatible?

VVsssssk commented 1 year ago

Hi, thanks for your issue. We will test and make the package compatible with PyTorch 1.13.

ted8201 commented 1 year ago

I met the same problem. Would pytorch 1.12 work?

KexianShen commented 1 year ago

don't need to downgrade pytorch.

my solution is editing mmdetection3d/mmdet3d/models/dense_heads/train_mixins.py:140

                if self.assign_per_class:
                    gt_per_cls = (gt_labels == i).cpu()    # line 140
WuYanXingege commented 1 year ago

don't need to downgrade pytorch.

my solution is editing mmdetection3d/mmdet3d/models/dense_heads/train_mixins.py:140

                if self.assign_per_class:
                    gt_per_cls = (gt_labels == i).cpu()    # line 140

thank you,so great,how do you find this?it is useful