Is there any way to build mmdetection to be compatible to both 1080Ti and 2080Ti in one machine?

Bai-YunHan commented 4 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help. I’ve read the issue pytorch/pytorch#31285, I don’t think GTX1080Ti is too ‘old’. Describe the bug I have 2GPU 1.RTX2080Ti; 2.GTX1080Ti. Training with 2080Ti works fine, however training with 1080Ti gives me RuntimeError: CUDA error: no kernel image is available for execution on the device

Reproduction

What command or script did you run?

CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py

Did you make any modifications on the code or config? Did you understand what you have modified? No.
What dataset did you use? cityscapes

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment infomation and paste it here. sys.platform: linux Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.2, V10.2.89 GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce GTX 1080 Ti GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.2.0 MMCV: 0.6.2 MMDetection: 2.2.0+741b638 MMDetection Compiler: GCC 7.5 MMDetection CUDA Compiler: 10.2

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source] I tried both conda and source, all gives me same problem.
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.) Not likely.

Error traceback If applicable, paste the error trackback here.

2020-07-03 22:44:46,184 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, May  7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GPU 0: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.2.0
MMCV: 0.6.2
MMDetection: 2.2.0+741b638
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.2
------------------------------------------------------------

2020-07-03 22:44:46,185 - mmdet - INFO - Distributed training: False
2020-07-03 22:44:47,065 - mmdet - INFO - Config:
model = dict(
    type='MaskRCNN',
    pretrained=None,
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', out_size=7, sample_num=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=8,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', loss_weight=1.0, beta=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', out_size=14, sample_num=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=8,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        mask_size=28,
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=1000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05,
        nms=dict(type='nms', iou_thr=0.5),
        max_per_img=100,
        mask_thr_binary=0.5))
dataset_type = 'CityscapesDataset'
data_root = 'data/cityscapes/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(
        type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2048, 1024),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=8,
        dataset=dict(
            type='CityscapesDataset',
            ann_file=
            'data/cityscapes/annotations/instancesonly_filtered_gtFine_train.json',
            img_prefix='data/cityscapes/leftImg8bit/train/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
                dict(
                    type='Resize',
                    img_scale=[(2048, 800), (2048, 1024)],
                    keep_ratio=True),
                dict(type='RandomFlip', flip_ratio=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='DefaultFormatBundle'),
                dict(
                    type='Collect',
                    keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
            ])),
    val=dict(
        type='CityscapesDataset',
        ann_file=
        'data/cityscapes/annotations/instancesonly_filtered_gtFine_val.json',
        img_prefix='data/cityscapes/leftImg8bit/val/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 1024),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CityscapesDataset',
        ann_file=
        'data/cityscapes/annotations/instancesonly_filtered_gtFine_test.json',
        img_prefix='data/cityscapes/leftImg8bit/test/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 1024),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'])
checkpoint_config = dict(interval=1)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth'
resume_from = None
workflow = [('train', 1)]
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[7])
total_epochs = 8
work_dir = './work_dirs/mask_rcnn_r50_fpn_1x_cityscapes'
gpu_ids = range(0, 1)

loading annotations into memory...
Done (t=0.32s)
creating index...
index created!
loading annotations into memory...
Done (t=0.16s)
creating index...
index created!
2020-07-03 22:44:50,864 - mmdet - INFO - load checkpoint from https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth
2020-07-03 22:44:51,012 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([9, 1024]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([9]).
size mismatch for roi_head.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([32, 1024]).
size mismatch for roi_head.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for roi_head.mask_head.conv_logits.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 256, 1, 1]).
size mismatch for roi_head.mask_head.conv_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([8]).
2020-07-03 22:44:51,015 - mmdet - INFO - Start running, host: byh@smart, work_dir: /home/byh/workspace/mmdetection/work_dirs/mask_rcnn_r50_fpn_1x_cityscapes
2020-07-03 22:44:51,015 - mmdet - INFO - workflow: [('train', 1)], max: 8 epochs
Traceback (most recent call last):
  File "tools/train.py", line 153, in <module>
    main()
  File "tools/train.py", line 149, in main
    meta=meta)
  File "/home/byh/workspace/mmdetection/mmdet/apis/train.py", line 128, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/byh/miniconda3/envs/env_1/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/byh/miniconda3/envs/env_1/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in train
    **kwargs)
  File "/home/byh/miniconda3/envs/env_1/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 31, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/byh/workspace/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/home/byh/miniconda3/envs/env_1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/byh/workspace/mmdetection/mmdet/core/fp16/decorators.py", line 51, in new_func
    return old_func(*args, **kwargs)
  File "/home/byh/workspace/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/byh/workspace/mmdetection/mmdet/models/detectors/two_stage.py", line 157, in forward_train
    proposal_cfg=proposal_cfg)
  File "/home/byh/workspace/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 58, in forward_train
    proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
  File "/home/byh/workspace/mmdetection/mmdet/core/fp16/decorators.py", line 131, in new_func
    return old_func(*args, **kwargs)
  File "/home/byh/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py", line 570, in get_bboxes
    scale_factor, cfg, rescale)
  File "/home/byh/workspace/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 167, in _get_bboxes_single
    dets, keep = batched_nms(proposals, scores, ids, nms_cfg)
  File "/home/byh/workspace/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 154, in batched_nms
    torch.cat([bboxes_for_nms, scores[:, None]], -1), **nms_cfg_)
  File "/home/byh/workspace/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 53, in nms
    inds = nms_ext.nms(dets_th, iou_thr)
RuntimeError: CUDA error: no kernel image is available for execution on the device

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Bai-YunHan commented 4 years ago

One more thing i’d like to add, The 1080Ti works fine on my other deep learning job.

Thanks in advance, looking forward to your reply.

Bai-YunHan commented 4 years ago

FYI, I tried to train on a GTX Titan X(pascal) this morning, gave me the same error: RuntimeError: CUDA error: no kernel image is available for execution on the device

Bai-YunHan commented 4 years ago

Updates...

If I re-build MMDetection, and only allow it to see 1080Ti, i.e. CUDA_VISIBLE_DEVICES=1 pip install -v -e . Then training with 1080Ti works, but training with 2080Ti fail with "RuntimeError: CUDA error: no kernel image is available for execution on the device”

If I re-build MMDetection, and only allow it to see 2080Ti, i.e. CUDA_VISIBLE_DEVICES=0 pip install -v -e . Then training with 2080Ti works, but training with 1080Ti fail with "RuntimeError: CUDA error: no kernel image is available for execution on the device”

Is there anyway to build mmdetection to be compatible to all existing GPUs?

rumusan commented 4 years ago

same problem with P40 and V100

hellock commented 4 years ago

One recommended solution is to build mmdetection on the specific GPU. Or you can specify the environment variable TORCH_CUDA_ARCH_LIST to include both your GPU arches.

rumusan commented 4 years ago

Solved with TORCH_CUDA_ARCH_LIST, many thanks.

Bai-YunHan commented 4 years ago

You suggested method works ! "specify the environment variable TORCH_CUDA_ARCH_LIST to include both your GPU arches.”

Thanks !

hezhu1996 commented 4 years ago

@Bai-YunHan Hello, i have the same problem. for setting the environment variable TORCH_CUDA_ARCH_LIST, would you tell me what's the exact command should run when build mmdet? Thanks

Bai-YunHan commented 4 years ago

@TWDH yes, TORCH_CUDA_ARCH_LIST="6.1;7.5" python -m pip install -v -e .

open-mmlab / mmdetection

Is there any way to build mmdetection to be compatible to both 1080Ti and 2080Ti in one machine? #3195