CUDA error with several attention heads

sainivedh19pt commented 2 years ago

Checklist

I have searched related issues but cannot get the expected help. (#270, #42)
The bug has not been fixed in the latest version. (mmseg - 0.21.1)

Describe the bug

RuntimeError: CUDA error: an illegal memory access was encountered

I was training my custom dataset on all the available models but facing the following error with several attention heads such as Segmenter, FastSTDC, ISANet, Iraspp, FastFCN, Distangled/Assymetric Non local Networks, CCNet, DANet
Faced the same issue with CGNet, FastSCNN but resolved by changing norm_cfg from "SyncBN" to just "BN"
Training/evaluation is woking well with remaining models, so I guess problem is not with custom dataset

Reproduction

What command or script did you run?
```
python tools/train.py [config_path]
```
Did you make any modifications on the code or config? Did you understand what you have modified?

Only changed the num_classes in decode_heads/wherever req. to fit to my custom dataset(19 classes)

What dataset did you use?

Custom dataset where, I create a 2d mask with labelled pixels as (0, num_classes-1) and 255 as unlabelled

Environment

Please run python mmseg/utils/collect_env.py to collect necessary environment information and paste it here.


'tail' is not recognized as an internal or external command,
operable program or batch file.
'gcc' is not recognized as an internal or external command,
operable program or batch file.
sys.platform: win32
Python: 3.7.11 (default, Jul 27 2021, 09:42:29) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
NVCC: Not Available
GCC: n/a
PyTorch: 1.10.2
PyTorch compiling details: PyTorch built with:
- C++ Version: 199711
- MSVC 192829337
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 2019
- LAPACK is enabled (usually provided by MKL)
- CPU capability usage: AVX512
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.4
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

TorchVision: 0.11.3 OpenCV: 4.5.5 MMCV: 1.4.4 MMCV Compiler: MSVC 193030709 MMCV CUDA Compiler: 11.6 MMSegmentation: 0.21.1+b163101

2. You may add addition that may be helpful for locating the problem, such as
    - How you installed PyTorch [e.g., pip, conda, source]
    `pip install torch torchvision`
    - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)

**Error traceback**

If applicable, paste the error trackback here.

```none
File "C:\Users\Sai_Nivedh\Projects\mmsegmentation\mmseg\apis\train.py", line 174, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "C:\Users\Sai_Nivedh\Projects\mmsegmentation\mmseg\models\segmentors\base.py", line 139, in train_step
    loss, log_vars = self._parse_losses(losses)
  File "C:\Users\Sai_Nivedh\Projects\mmsegmentation\mmseg\models\segmentors\base.py", line 208, in _parse_losses
    log_vars[loss_name] = loss_value.item()
RuntimeError: CUDA error: an illegal memory access was encountered

Whole Config

# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=norm_cfg,
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='DNLHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        dropout_ratio=0.1,
        reduction=2,
        use_scale=True,
        mode='embedded_gaussian',
        num_classes=19,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=19,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))

dataset_type = 'CustomDataset'
data_root = 'datasets\custom_cityscapes'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2048, 512),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        reduce_zero_label=True,
        img_dir='images',
        ann_dir='labels',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        reduce_zero_label=True,
        img_dir='images',
        ann_dir='labels',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        reduce_zero_label=True,
        img_dir='images',
        ann_dir='labels',
        pipeline=test_pipeline))

# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
# learning policy
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
# runtime settings
runner = dict(type='IterBasedRunner', max_iters=40000)
checkpoint_config = dict(by_epoch=False, interval=4000)
evaluation = dict(interval=400, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = r'checkpoints\dnl_r50-d8_512x1024_40k_cityscapes_20200904_233629-53d4ea93.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

MengzhangLI commented 2 years ago

I have two questions or comments.

(1) Could you try to train these bug-happened models with our provided datasets such as Cityscapes and ADE20K to test whether cuda out of memory still encounter?

(2) Before your issue, I have faced certain problems because I missed 'RandomCrop' and 'Pad' like here:

https://github.com/open-mmlab/mmsegmentation/pull/955#issuecomment-1005385112

Hope my experience could help you locate your problems.

Best,

sainivedh19pt commented 2 years ago

Hi @MengzhangLI ,

Thanks for the response

The error I faced is not CUDA Out of memory, posting extended stacktrace for better insights,

File "C:\Users\Sai_Nivedh\Projects\mmsegmentation\mmseg\apis\train.py", line 174, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\runner\iter_based_runner.py", line 134, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\runner\iter_based_runner.py", line 61, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\data_parallel.py", line 74, in train_step
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\data_parallel.py", line 53, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 51, in scatter_kwargs
    inputs = scatter(inputs, target_gpus, dim) if inputs else []
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 44, in scatter
    return scatter_map(inputs)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 29, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 34, in scatter_map
    out = list(map(type(obj), zip(*map(scatter_map, obj.items()))))
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 29, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 27, in scatter_map
    return Scatter.forward(target_gpus, obj.data)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 71, in forward
    outputs = scatter(input, target_gpus, streams)
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 15, in scatter
    [streams[i // chunk_size]]) for i in range(len(input))
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 15, in <listcomp>
    [streams[i // chunk_size]]) for i in range(len(input))
  File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 24, in scatter
    output = output.cuda(devices[0], non_blocking=True)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

MengzhangLI commented 2 years ago

It is usually caused by your wrong num_classes in config, it should be n = number of foreground + background (usually it is label 0). For example, if you have only one kind of foreground, it should be num_classes=2.

open-mmlab / mmsegmentation

CUDA error with several attention heads #1330