open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.36k stars 1.55k forks source link

When I want to train the MVXNet, there was an error in matrix inverse() #2542

Closed zhangtingyu11 closed 10 months ago

zhangtingyu11 commented 1 year ago

I want the train the MVXNet on KITTI dataset, when I use the following command to train the model, the error occurs.

python tools/train.py configs/mvxnet/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.py

And the log is:

05/19 17:01:52 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
    CUDA available: True
    numpy_random_seed: 1243525305
    GPU 0: NVIDIA GeForce RTX 4090
    CUDA_HOME: /usr/local/cuda-11.1
    NVCC: Cuda compilation tools, release 11.1, V11.1.105
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    PyTorch: 1.8.0
    PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

    TorchVision: 0.9.0
    OpenCV: 4.7.0
    MMEngine: 0.7.3

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: None
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

05/19 17:01:52 - mmengine - INFO - Config:
lr = 0.003
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(
        type='AdamW', lr=0.003, weight_decay=0.01, betas=(0.95, 0.99)),
    clip_grad=dict(max_norm=35, norm_type=2))
param_scheduler = [
    dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000),
    dict(
        type='CosineAnnealingLR',
        begin=0,
        T_max=40,
        end=40,
        by_epoch=True,
        eta_min=1e-05)
]
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
auto_scale_lr = dict(enable=False, base_batch_size=16)
default_scope = 'mmdet3d'
default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', interval=-1),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='Det3DVisualizationHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'))
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
log_level = 'INFO'
load_from = 'https://download.openmmlab.com/mmdetection3d/pretrain_models/mvx_faster_rcnn_detectron2-caffe_20e_coco-pretrain_gt-sample_kitti-3-class_moderate-79.3_20200207-a4a6a3c7.pth'
resume = False
voxel_size = [0.05, 0.05, 0.1]
point_cloud_range = [0, -40, -3, 70.4, 40, 1]
model = dict(
    type='DynamicMVXFasterRCNN',
    data_preprocessor=dict(
        type='Det3DDataPreprocessor',
        voxel=True,
        voxel_type='dynamic',
        voxel_layer=dict(
            max_num_points=-1,
            point_cloud_range=[0, -40, -3, 70.4, 40, 1],
            voxel_size=[0.05, 0.05, 0.1],
            max_voxels=(-1, -1)),
        mean=[102.9801, 115.9465, 122.7717],
        std=[1.0, 1.0, 1.0],
        bgr_to_rgb=False,
        pad_size_divisor=32),
    img_backbone=dict(
        type='mmdet.ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    img_neck=dict(
        type='mmdet.FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    pts_voxel_encoder=dict(
        type='DynamicVFE',
        in_channels=4,
        feat_channels=[64, 64],
        with_distance=False,
        voxel_size=[0.05, 0.05, 0.1],
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=[0, -40, -3, 70.4, 40, 1],
        fusion_layer=dict(
            type='PointFusion',
            img_channels=256,
            pts_channels=64,
            mid_channels=128,
            out_channels=128,
            img_levels=[0, 1, 2, 3, 4],
            align_corners=False,
            activate_out=True,
            fuse_out=False)),
    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=128,
        sparse_shape=[41, 1600, 1408],
        order=('conv', 'norm', 'act')),
    pts_backbone=dict(
        type='SECOND',
        in_channels=256,
        layer_nums=[5, 5],
        layer_strides=[1, 2],
        out_channels=[128, 256]),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[128, 256],
        upsample_strides=[1, 2],
        out_channels=[256, 256]),
    pts_bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=512,
        feat_channels=512,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='Anchor3DRangeGenerator',
            ranges=[[0, -40.0, -0.6, 70.4, 40.0, -0.6],
                    [0, -40.0, -0.6, 70.4, 40.0, -0.6],
                    [0, -40.0, -1.78, 70.4, 40.0, -1.78]],
            sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
            rotations=[0, 1.57],
            reshape_out=False),
        assigner_per_size=True,
        diff_rad_by_sin=True,
        assign_per_class=True,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='mmdet.FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(
            type='mmdet.SmoothL1Loss',
            beta=0.1111111111111111,
            loss_weight=2.0),
        loss_dir=dict(
            type='mmdet.CrossEntropyLoss', use_sigmoid=False,
            loss_weight=0.2)),
    train_cfg=dict(
        pts=dict(
            assigner=[
                dict(
                    type='Max3DIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(
                    type='Max3DIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(
                    type='Max3DIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.45,
                    min_pos_iou=0.45,
                    ignore_iof_thr=-1)
            ],
            allowed_border=0,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        pts=dict(
            use_rotate_nms=True,
            nms_across_levels=False,
            nms_thr=0.01,
            score_thr=0.1,
            min_bbox_size=0,
            nms_pre=100,
            max_num=50)))
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
metainfo = dict(classes=['Pedestrian', 'Cyclist', 'Car'])
input_modality = dict(use_lidar=True, use_camera=True)
backend_args = None
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        backend_args=None),
    dict(type='LoadImageFromFile', backend_args=None),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='RandomResize', scale=[(640, 192), (2560, 768)], keep_ratio=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0.2, 0.2, 0.2]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='PointsRangeFilter', point_cloud_range=[0, -40, -3, 70.4, 40, 1]),
    dict(
        type='ObjectRangeFilter', point_cloud_range=[0, -40, -3, 70.4, 40, 1]),
    dict(type='PointShuffle'),
    dict(
        type='Pack3DDetInputs',
        keys=[
            'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d', 'gt_bboxes',
            'gt_labels'
        ])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        backend_args=None),
    dict(type='LoadImageFromFile', backend_args=None),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1280, 384),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(type='Resize', scale=0, keep_ratio=True),
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1.0, 1.0],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter',
                point_cloud_range=[0, -40, -3, 70.4, 40, 1])
        ]),
    dict(type='Pack3DDetInputs', keys=['points', 'img'])
]
modality = dict(use_lidar=True, use_camera=True)
train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type='KittiDataset',
            data_root='data/kitti/',
            modality=dict(use_lidar=True, use_camera=True),
            ann_file='kitti_infos_train.pkl',
            data_prefix=dict(
                pts='training/velodyne_reduced', img='training/image_2'),
            pipeline=[
                dict(
                    type='LoadPointsFromFile',
                    coord_type='LIDAR',
                    load_dim=4,
                    use_dim=4,
                    backend_args=None),
                dict(type='LoadImageFromFile', backend_args=None),
                dict(
                    type='LoadAnnotations3D',
                    with_bbox_3d=True,
                    with_label_3d=True),
                dict(
                    type='RandomResize',
                    scale=[(640, 192), (2560, 768)],
                    keep_ratio=True),
                dict(
                    type='GlobalRotScaleTrans',
                    rot_range=[-0.78539816, 0.78539816],
                    scale_ratio_range=[0.95, 1.05],
                    translation_std=[0.2, 0.2, 0.2]),
                dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
                dict(
                    type='PointsRangeFilter',
                    point_cloud_range=[0, -40, -3, 70.4, 40, 1]),
                dict(
                    type='ObjectRangeFilter',
                    point_cloud_range=[0, -40, -3, 70.4, 40, 1]),
                dict(type='PointShuffle'),
                dict(
                    type='Pack3DDetInputs',
                    keys=[
                        'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d',
                        'gt_bboxes', 'gt_labels'
                    ])
            ],
            filter_empty_gt=False,
            metainfo=dict(classes=['Pedestrian', 'Cyclist', 'Car']),
            box_type_3d='LiDAR',
            backend_args=None)))
val_dataloader = dict(
    batch_size=1,
    num_workers=1,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type='KittiDataset',
        data_root='data/kitti/',
        modality=dict(use_lidar=True, use_camera=True),
        ann_file='kitti_infos_val.pkl',
        data_prefix=dict(
            pts='training/velodyne_reduced', img='training/image_2'),
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=4,
                use_dim=4,
                backend_args=None),
            dict(type='LoadImageFromFile', backend_args=None),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1280, 384),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(type='Resize', scale=0, keep_ratio=True),
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[0, -40, -3, 70.4, 40, 1])
                ]),
            dict(type='Pack3DDetInputs', keys=['points', 'img'])
        ],
        metainfo=dict(classes=['Pedestrian', 'Cyclist', 'Car']),
        test_mode=True,
        box_type_3d='LiDAR',
        backend_args=None))
test_dataloader = dict(
    batch_size=1,
    num_workers=1,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type='KittiDataset',
        data_root='data/kitti/',
        ann_file='kitti_infos_val.pkl',
        modality=dict(use_lidar=True, use_camera=True),
        data_prefix=dict(
            pts='training/velodyne_reduced', img='training/image_2'),
        pipeline=[
            dict(
                type='LoadPointsFromFile',
                coord_type='LIDAR',
                load_dim=4,
                use_dim=4,
                backend_args=None),
            dict(type='LoadImageFromFile', backend_args=None),
            dict(
                type='MultiScaleFlipAug3D',
                img_scale=(1280, 384),
                pts_scale_ratio=1,
                flip=False,
                transforms=[
                    dict(type='Resize', scale=0, keep_ratio=True),
                    dict(
                        type='GlobalRotScaleTrans',
                        rot_range=[0, 0],
                        scale_ratio_range=[1.0, 1.0],
                        translation_std=[0, 0, 0]),
                    dict(type='RandomFlip3D'),
                    dict(
                        type='PointsRangeFilter',
                        point_cloud_range=[0, -40, -3, 70.4, 40, 1])
                ]),
            dict(type='Pack3DDetInputs', keys=['points', 'img'])
        ],
        metainfo=dict(classes=['Pedestrian', 'Cyclist', 'Car']),
        test_mode=True,
        box_type_3d='LiDAR',
        backend_args=None))
val_evaluator = dict(
    type='KittiMetric', ann_file='data/kitti/kitti_infos_val.pkl')
test_evaluator = dict(
    type='KittiMetric', ann_file='data/kitti/kitti_infos_val.pkl')
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='Det3DLocalVisualizer',
    vis_backends=[dict(type='LocalVisBackend')],
    name='visualizer')
launcher = 'none'
work_dir = './work_dirs/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class'

/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/dense_heads/anchor3d_head.py:92: UserWarning: dir_offset and dir_limit_offset will be depressed and be incorporated into box coder in the future
  warnings.warn(
05/19 17:01:55 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
05/19 17:01:55 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train:
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) Det3DVisualizationHook             
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
05/19 17:01:57 - mmengine - INFO - ------------------------------
05/19 17:01:57 - mmengine - INFO - The length of the dataset: 3712
05/19 17:01:57 - mmengine - INFO - The number of instances per category in the dataset:
+------------+--------+
| category   | number |
+------------+--------+
| Pedestrian | 2207   |
| Cyclist    | 734    |
| Car        | 14357  |
+------------+--------+
05/19 17:01:58 - mmengine - INFO - ------------------------------
05/19 17:01:58 - mmengine - INFO - The length of the dataset: 3769
05/19 17:01:58 - mmengine - INFO - The number of instances per category in the dataset:
+------------+--------+
| category   | number |
+------------+--------+
| Pedestrian | 2280   |
| Cyclist    | 893    |
| Car        | 14385  |
+------------+--------+
/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41):
Loads checkpoint by http backend from path: https://download.openmmlab.com/mmdetection3d/pretrain_models/mvx_faster_rcnn_detectron2-caffe_20e_coco-pretrain_gt-sample_kitti-3-class_moderate-79.3_20200207-a4a6a3c7.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: img_rpn_head.rpn_conv.weight, img_rpn_head.rpn_conv.bias, img_rpn_head.rpn_cls.weight, img_rpn_head.rpn_cls.bias, img_rpn_head.rpn_reg.weight, img_rpn_head.rpn_reg.bias, img_bbox_head.fc_cls.weight, img_bbox_head.fc_cls.bias, img_bbox_head.fc_reg.weight, img_bbox_head.fc_reg.bias, img_bbox_head.shared_fcs.0.weight, img_bbox_head.shared_fcs.0.bias, img_bbox_head.shared_fcs.1.weight, img_bbox_head.shared_fcs.1.bias

missing keys in source state_dict: pts_voxel_encoder.vfe_layers.0.0.weight, pts_voxel_encoder.vfe_layers.0.1.weight, pts_voxel_encoder.vfe_layers.0.1.bias, pts_voxel_encoder.vfe_layers.0.1.running_mean, pts_voxel_encoder.vfe_layers.0.1.running_var, pts_voxel_encoder.vfe_layers.1.0.weight, pts_voxel_encoder.vfe_layers.1.1.weight, pts_voxel_encoder.vfe_layers.1.1.bias, pts_voxel_encoder.vfe_layers.1.1.running_mean, pts_voxel_encoder.vfe_layers.1.1.running_var, pts_voxel_encoder.fusion_layer.lateral_convs.0.conv.weight, pts_voxel_encoder.fusion_layer.lateral_convs.0.conv.bias, pts_voxel_encoder.fusion_layer.lateral_convs.1.conv.weight, pts_voxel_encoder.fusion_layer.lateral_convs.1.conv.bias, pts_voxel_encoder.fusion_layer.lateral_convs.2.conv.weight, pts_voxel_encoder.fusion_layer.lateral_convs.2.conv.bias, pts_voxel_encoder.fusion_layer.lateral_convs.3.conv.weight, pts_voxel_encoder.fusion_layer.lateral_convs.3.conv.bias, pts_voxel_encoder.fusion_layer.lateral_convs.4.conv.weight, pts_voxel_encoder.fusion_layer.lateral_convs.4.conv.bias, pts_voxel_encoder.fusion_layer.img_transform.0.weight, pts_voxel_encoder.fusion_layer.img_transform.0.bias, pts_voxel_encoder.fusion_layer.img_transform.1.weight, pts_voxel_encoder.fusion_layer.img_transform.1.bias, pts_voxel_encoder.fusion_layer.img_transform.1.running_mean, pts_voxel_encoder.fusion_layer.img_transform.1.running_var, pts_voxel_encoder.fusion_layer.pts_transform.0.weight, pts_voxel_encoder.fusion_layer.pts_transform.0.bias, pts_voxel_encoder.fusion_layer.pts_transform.1.weight, pts_voxel_encoder.fusion_layer.pts_transform.1.bias, pts_voxel_encoder.fusion_layer.pts_transform.1.running_mean, pts_voxel_encoder.fusion_layer.pts_transform.1.running_var, pts_middle_encoder.conv_input.0.weight, pts_middle_encoder.conv_input.1.weight, pts_middle_encoder.conv_input.1.bias, pts_middle_encoder.conv_input.1.running_mean, pts_middle_encoder.conv_input.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer1.0.0.weight, pts_middle_encoder.encoder_layers.encoder_layer1.0.1.weight, pts_middle_encoder.encoder_layers.encoder_layer1.0.1.bias, pts_middle_encoder.encoder_layers.encoder_layer1.0.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer1.0.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer2.0.0.weight, pts_middle_encoder.encoder_layers.encoder_layer2.0.1.weight, pts_middle_encoder.encoder_layers.encoder_layer2.0.1.bias, pts_middle_encoder.encoder_layers.encoder_layer2.0.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer2.0.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer2.1.0.weight, pts_middle_encoder.encoder_layers.encoder_layer2.1.1.weight, pts_middle_encoder.encoder_layers.encoder_layer2.1.1.bias, pts_middle_encoder.encoder_layers.encoder_layer2.1.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer2.1.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer2.2.0.weight, pts_middle_encoder.encoder_layers.encoder_layer2.2.1.weight, pts_middle_encoder.encoder_layers.encoder_layer2.2.1.bias, pts_middle_encoder.encoder_layers.encoder_layer2.2.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer2.2.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer3.0.0.weight, pts_middle_encoder.encoder_layers.encoder_layer3.0.1.weight, pts_middle_encoder.encoder_layers.encoder_layer3.0.1.bias, pts_middle_encoder.encoder_layers.encoder_layer3.0.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer3.0.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer3.1.0.weight, pts_middle_encoder.encoder_layers.encoder_layer3.1.1.weight, pts_middle_encoder.encoder_layers.encoder_layer3.1.1.bias, pts_middle_encoder.encoder_layers.encoder_layer3.1.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer3.1.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer3.2.0.weight, pts_middle_encoder.encoder_layers.encoder_layer3.2.1.weight, pts_middle_encoder.encoder_layers.encoder_layer3.2.1.bias, pts_middle_encoder.encoder_layers.encoder_layer3.2.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer3.2.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer4.0.0.weight, pts_middle_encoder.encoder_layers.encoder_layer4.0.1.weight, pts_middle_encoder.encoder_layers.encoder_layer4.0.1.bias, pts_middle_encoder.encoder_layers.encoder_layer4.0.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer4.0.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer4.1.0.weight, pts_middle_encoder.encoder_layers.encoder_layer4.1.1.weight, pts_middle_encoder.encoder_layers.encoder_layer4.1.1.bias, pts_middle_encoder.encoder_layers.encoder_layer4.1.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer4.1.1.running_var, pts_middle_encoder.encoder_layers.encoder_layer4.2.0.weight, pts_middle_encoder.encoder_layers.encoder_layer4.2.1.weight, pts_middle_encoder.encoder_layers.encoder_layer4.2.1.bias, pts_middle_encoder.encoder_layers.encoder_layer4.2.1.running_mean, pts_middle_encoder.encoder_layers.encoder_layer4.2.1.running_var, pts_middle_encoder.conv_out.0.weight, pts_middle_encoder.conv_out.1.weight, pts_middle_encoder.conv_out.1.bias, pts_middle_encoder.conv_out.1.running_mean, pts_middle_encoder.conv_out.1.running_var, pts_backbone.blocks.0.0.weight, pts_backbone.blocks.0.1.weight, pts_backbone.blocks.0.1.bias, pts_backbone.blocks.0.1.running_mean, pts_backbone.blocks.0.1.running_var, pts_backbone.blocks.0.3.weight, pts_backbone.blocks.0.4.weight, pts_backbone.blocks.0.4.bias, pts_backbone.blocks.0.4.running_mean, pts_backbone.blocks.0.4.running_var, pts_backbone.blocks.0.6.weight, pts_backbone.blocks.0.7.weight, pts_backbone.blocks.0.7.bias, pts_backbone.blocks.0.7.running_mean, pts_backbone.blocks.0.7.running_var, pts_backbone.blocks.0.9.weight, pts_backbone.blocks.0.10.weight, pts_backbone.blocks.0.10.bias, pts_backbone.blocks.0.10.running_mean, pts_backbone.blocks.0.10.running_var, pts_backbone.blocks.0.12.weight, pts_backbone.blocks.0.13.weight, pts_backbone.blocks.0.13.bias, pts_backbone.blocks.0.13.running_mean, pts_backbone.blocks.0.13.running_var, pts_backbone.blocks.0.15.weight, pts_backbone.blocks.0.16.weight, pts_backbone.blocks.0.16.bias, pts_backbone.blocks.0.16.running_mean, pts_backbone.blocks.0.16.running_var, pts_backbone.blocks.1.0.weight, pts_backbone.blocks.1.1.weight, pts_backbone.blocks.1.1.bias, pts_backbone.blocks.1.1.running_mean, pts_backbone.blocks.1.1.running_var, pts_backbone.blocks.1.3.weight, pts_backbone.blocks.1.4.weight, pts_backbone.blocks.1.4.bias, pts_backbone.blocks.1.4.running_mean, pts_backbone.blocks.1.4.running_var, pts_backbone.blocks.1.6.weight, pts_backbone.blocks.1.7.weight, pts_backbone.blocks.1.7.bias, pts_backbone.blocks.1.7.running_mean, pts_backbone.blocks.1.7.running_var, pts_backbone.blocks.1.9.weight, pts_backbone.blocks.1.10.weight, pts_backbone.blocks.1.10.bias, pts_backbone.blocks.1.10.running_mean, pts_backbone.blocks.1.10.running_var, pts_backbone.blocks.1.12.weight, pts_backbone.blocks.1.13.weight, pts_backbone.blocks.1.13.bias, pts_backbone.blocks.1.13.running_mean, pts_backbone.blocks.1.13.running_var, pts_backbone.blocks.1.15.weight, pts_backbone.blocks.1.16.weight, pts_backbone.blocks.1.16.bias, pts_backbone.blocks.1.16.running_mean, pts_backbone.blocks.1.16.running_var, pts_neck.deblocks.0.0.weight, pts_neck.deblocks.0.1.weight, pts_neck.deblocks.0.1.bias, pts_neck.deblocks.0.1.running_mean, pts_neck.deblocks.0.1.running_var, pts_neck.deblocks.1.0.weight, pts_neck.deblocks.1.1.weight, pts_neck.deblocks.1.1.bias, pts_neck.deblocks.1.1.running_mean, pts_neck.deblocks.1.1.running_var, pts_bbox_head.conv_cls.weight, pts_bbox_head.conv_cls.bias, pts_bbox_head.conv_reg.weight, pts_bbox_head.conv_reg.bias, pts_bbox_head.conv_dir_cls.weight, pts_bbox_head.conv_dir_cls.bias

05/19 17:01:59 - mmengine - INFO - Load checkpoint from https://download.openmmlab.com/mmdetection3d/pretrain_models/mvx_faster_rcnn_detectron2-caffe_20e_coco-pretrain_gt-sample_kitti-3-class_moderate-79.3_20200207-a4a6a3c7.pth
05/19 17:01:59 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
05/19 17:01:59 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
05/19 17:01:59 - mmengine - INFO - Checkpoints will be saved to /home/zty/Project/DeepLearning/mmdetection3d/work_dirs/mvxnet_fpn_dv_second_secfpn_8xb2-80e_kitti-3d-3class.
/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/coord_transform.py:40: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(img_meta['pcd_rotation'], dtype=dtype, device=device)
Traceback (most recent call last):
  File "tools/train.py", line 135, in <module>
    main()
  File "tools/train.py", line 131, in main
    runner.train()
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1721, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
    losses = self._run_forward(data, mode='loss')  # type: ignore
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
    results = self(**data, mode=mode)
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/detectors/base.py", line 76, in forward
    return self.loss(inputs, data_samples, **kwargs)
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 270, in loss
    img_feats, pts_feats = self.extract_feat(batch_inputs_dict,
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 242, in extract_feat
    pts_feats = self.extract_pts_feat(
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/detectors/mvx_faster_rcnn.py", line 48, in extract_pts_feat
    voxel_features, feature_coors = self.pts_voxel_encoder(
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/voxel_encoders/voxel_encoder.py", line 274, in forward
    point_feats = self.fusion_layer(img_feats, points, point_feats,
  File "/home/zty/Dataset/conda_envs/mmdet3d_cuda11.1_cudnn8.0.5/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/point_fusion.py", line 245, in forward
    img_pts = self.obtain_mlvl_feats(img_feats, pts, img_metas)
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/point_fusion.py", line 285, in obtain_mlvl_feats
    self.sample_single(img_ins[level][i:i + 1], pts[i][:, :3],
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/point_fusion.py", line 314, in sample_single
    img_pts = point_sample(
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/point_fusion.py", line 63, in point_sample
    points = apply_3d_transformation(
  File "/home/zty/Project/DeepLearning/mmdetection3d/mmdet3d/models/layers/fusion_layers/coord_transform.py", line 76, in apply_3d_transformation
    rotate_func = partial(pcd.rotate, rotation=pcd_rotate_mat.inverse())
RuntimeError: cusolver error: 7, when calling `cusolverDnCreate(handle)`

I clone the code from the branch main and install following the installation guide. The main library version is as below.

mmcv=2.0.0 
mmdet=3.0.0 
mmdet3d=1.1.0
mmengine=0.7.3  
pytorch=1.8.0

I have tested with the command line with the following command and it's ok

python 
import torch
x = torch.rand(3,3)
x.cuda()
x.inverse()

I don't know why I get the aforementioned error, hoping for your help!

zhangtingyu11 commented 1 year ago

When I change the following line in coordinate_transform.py to the following line, it works fine.

rotate_func = partial(pcd.rotate, rotation=(pcd_rotate_mat.cpu().inverse()).cuda())

When I run the train process, there are 4GB memory left in the RTX4090. I find the similar issue in pytorch3d-issue, they say it is caused by out of memory, but can the 3*3 matrix inverse occupy 4GB memory?