CUDA out of memory - Githubissues

MABatin commented 2 years ago

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug I am trying to train a retinanet model with bifpn neck. I've created necessary config files but the code stops in the training phase with

RuntimeError: CUDA out of memory. Tried to allocate 800.00 MiB (GPU 0; 8.00 GiB total capacity; 7.14 GiB already allocated; 0 bytes free; 7.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_
split_size_mb to avoid fragmentation.
The latest traceback being:
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)

Reproduction

What command or script did you run?

python mmdetection/tools/train.py configs/retinanet_r50_bifpn_2x_coco.py

Did you make any modifications on the code or config? Did you understand what you have modified?
- I am using BiFPN as the neck in Retinanet model from another repo (using mmdet as well). Following is my config file

- _base_ = [
    '../../../mmdetection/configs/_base_/models/retinanet_r50_fpn.py',
    '../datasets/spike_detection.py',
    '../../../mmdetection/configs/_base_/schedules/schedule_2x.py',
    '../../../mmdetection/configs/_base_/default_runtime.py'
]
device = 'cuda'
# modify num classes of the model in box head
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    neck = dict(
        type='BiFPN',
        in_channels=[256, 512, 1024, 2048],
        strides=[8, 16, 32, 64],
        out_channels=256,
        num_outs=5,
        norm_cfg=norm_cfg,
        stack=8,
        act_cfg=None
    ),
    bbox_head=dict(num_classes=1),
    test_cfg = dict(
        nms=dict(type='softnms', iou_threshold=0.5)))

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(640, 640), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=(640, 640)),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(640, 640),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size=(640, 640)),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=0,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))

#load_from = 'Wheat/checkpoints/Downloaded/retinanet_r50_fpn_2x_coco_20200131-fdb43119.pth'

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

# Set up working dir to save files and logs.
work_dir = 'Wheat/work_dirs/spikeretinanet/'

auto_scale_lr = dict(enable=True)

# Change the evaluation metric since we use customized dataset.
# We can set the evaluation interval to reduce the evaluation times
# We can set the checkpoint saving interval to reduce the storage cost
checkpoint_config = dict(interval=4)
evaluation = dict(interval=1, metric='bbox', save_best='bbox_mAP')
# Set seed thus the results are more reproducible
gpu_ids = range(1)
seed = 0

# learning policy
lr_config = dict(warmup=None)

What dataset did you use? A custom dataset containing a single class ('Spike',). I converted the annotations into coco format.

Environment

- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -o
penmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF
_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

TorchVision: 0.11.3
OpenCV: 4.5.5
MMCV: 1.5.0
MMCV Compiler: MSVC 192930140
MMCV CUDA Compiler: 11.3
MMDetection: 2.24.1+

-I'm also adding the code of the BiFPN neck that I tried to use:

import torch
import torch.nn as nn
import torch.nn.functional as F
from mmcv.cnn import xavier_init
from mmcv.cnn import constant_init, kaiming_init
from torch.nn.modules.batchnorm import _BatchNorm
import functools
from mmcv.runner import BaseModule, auto_fp16, ModuleList
from ..builder import NECKS
from mmcv.cnn import ConvModule
from mmcv.cnn import build_norm_layer

class Swish(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)

class SeparableConv(nn.Module):

    def __init__(self,
                 in_channels,
                 out_channels,
                 bias=False,
                 relu=False):
        super(SeparableConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.relu = relu

        self.sep = nn.Conv2d(in_channels,
                             in_channels,
                             3,
                             padding=1,
                             groups=in_channels,
                             bias=False)
        self.pw = nn.Conv2d(
            in_channels,
            out_channels,
            1,
            bias=bias)
        if relu:
            self.relu_fn = Swish()

    def forward(self, x):
        x = self.pw(self.sep(x))
        if self.relu:
            x = self.relu_fn(x)
        return x

class WeightedInputConv(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 num_ins,
                 conv_cfg=None,
                 norm_cfg=None,
                 separable_conv=True,
                 act_cfg=None,
                 eps=0.0001):
        super(WeightedInputConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.act_cfg = act_cfg
        self.num_ins = num_ins
        self.eps = eps
        self.separable_conv = separable_conv

        self.sep_conv = ConvModule(
            in_channels,
            in_channels,
            3,
            padding=1,
            groups=in_channels,
            conv_cfg=None,
            norm_cfg=None,
            act_cfg=None,
            inplace=False)
        self.pw_conv = ConvModule(
            in_channels,
            out_channels,
            1,
            norm_cfg=norm_cfg,
            act_cfg=dict(type='ReLU'),
            inplace=False)
        # td_conv = torch.nn.Sequential(td_sep_conv, td_pw_conv)
        self.weight = nn.Parameter(torch.Tensor(self.num_ins).fill_(1.0))
        self.relu = nn.ReLU(inplace=False)
        # self.relu = F.relu

    def forward(self, inputs):
        assert isinstance(inputs, list)
        assert len(inputs) == self.num_ins
        w = self.relu(self.weight)
        w /= (w.sum() + self.eps)
        x = 0
        for i in range(self.num_ins):
            x += w[i] * inputs[i]
        output = self.pw_conv(self.sep_conv(F.relu(x)))
        return output

class WeightedInputConv_V2(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 num_ins,
                 conv_cfg=None,
                 norm_cfg=None,
                 separable_conv=True,
                 act_cfg=None,
                 eps=0.0001):
        super(WeightedInputConv_V2, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.act_cfg = act_cfg
        self.num_ins = num_ins
        self.eps = eps
        if separable_conv:
            _, bn_layer = build_norm_layer(norm_cfg, out_channels)
            self.conv_op = nn.Sequential(
                SeparableConv(
                    in_channels,
                    out_channels,
                    bias=True,
                    relu=False),
                bn_layer
            )
        else:
            self.conv_op = ConvModule(
                in_channels,
                out_channels,
                3,
                padding=1,
                conv_cfg=None,
                norm_cfg=norm_cfg,
                act_cfg=None,
                inplace=False)

        # edge weight and swish
        self.weight = nn.Parameter(torch.Tensor(self.num_ins).fill_(1.0))
        self._swish = Swish()

    def forward(self, inputs):
        assert isinstance(inputs, list)
        assert len(inputs) == self.num_ins
        w = F.relu(self.weight)
        w /= (w.sum() + self.eps)
        x = 0
        for i in range(self.num_ins):
            x += w[i] * inputs[i]
        # import pdb; pdb.set_trace()
        output = self.conv_op(self._swish(x))
        return output

class ResampingConv(nn.Module):
    """
    in_channels,
    in_width,
    target_width,
    target_num_channels,
    conv_cfg=None,
    norm_cfg=None,
    separable_conv=False,
    act_cfg=None
    """

    def __init__(self,
                 in_channels,
                 in_stride,
                 out_stride,
                 out_channels,
                 conv_cfg=None,
                 norm_cfg=None,
                 separable_conv=False,
                 act_cfg=None):
        super(ResampingConv, self).__init__()
        # assert out_stride % in_stride == 0 or out_stride % in_stride == 0
        self.in_channels = in_channels
        self.in_stride = in_stride
        self.out_stride = out_stride
        self.out_channels = out_channels
        self.norm_cfg = norm_cfg
        self.conv_cfg = conv_cfg
        if self.in_stride < self.out_stride:
            scale = int(self.out_stride // self.in_stride)
            assert scale == 2
            self.rescale_op = nn.MaxPool2d(
                scale + 1,
                stride=scale,
                padding=1)
            # self.rescale_op = nn.MaxPool2d(2, stride=scale)
        else:
            if self.in_stride > self.out_stride:
                scale = self.in_stride // self.out_stride
                self.rescale_op = functools.partial(
                    F.interpolate, scale_factor=scale, mode='nearest')
            else:
                self.rescale_op = None

        if self.in_channels != self.out_channels:
            if separable_conv:
                raise NotImplementedError
            else:
                self.conv_op = ConvModule(
                    in_channels,
                    out_channels,
                    1,
                    norm_cfg=norm_cfg,
                    act_cfg=None,
                    inplace=False)

    def forward(self, x):
        # import pdb; pdb.set_trace()
        if self.in_channels != self.out_channels:
            x = self.conv_op(x)
        x = self.rescale_op(x) if self.rescale_op else x
        return x

#@NECKS.register_module()
class bifpn(BaseModule):
    nodes_settings = [
        {'width_ratio': 64, 'inputs_offsets': [3, 4]},
        {'width_ratio': 32, 'inputs_offsets': [2, 5]},
        {'width_ratio': 16, 'inputs_offsets': [1, 6]},
        {'width_ratio': 8, 'inputs_offsets': [0, 7]},
        {'width_ratio': 16, 'inputs_offsets': [1, 7, 8]},
        {'width_ratio': 32, 'inputs_offsets': [2, 6, 9]},
        {'width_ratio': 64, 'inputs_offsets': [3, 5, 10]},
        {'width_ratio': 128, 'inputs_offsets': [4, 11]},
    ]

    def __init__(self,
                 in_channels,
                 out_channels,
                 strides=[8, 16, 32, 64, 128],
                 num_outs=5,
                 conv_cfg=None,
                 norm_cfg=None,
                 use_batch_norm=False,
                 act_cfg=None):
        super(bifpn, self).__init__()
        assert num_outs >= 2
        assert len(strides) == len(in_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.act_cfg = act_cfg
        self.num_outs = num_outs

        self.channels_nodes = [i for i in in_channels]
        self.stride_nodes = [i for i in strides]
        self.resample_op_nodes = nn.ModuleList()
        self.new_op_nodes = nn.ModuleList()
        for _, fnode in enumerate(self.nodes_settings):
            new_node_stride = fnode['width_ratio']
            op_node = nn.ModuleList()
            for _, input_offset in enumerate(fnode['inputs_offsets']):
                input_node = ResampingConv(
                    self.channels_nodes[input_offset],
                    self.stride_nodes[input_offset],
                    new_node_stride,
                    out_channels,
                    norm_cfg=norm_cfg)
                op_node.append(input_node)
            new_op_node = WeightedInputConv_V2(
                out_channels,
                out_channels,
                len(fnode['inputs_offsets']),
                conv_cfg=conv_cfg,
                norm_cfg=norm_cfg,
                act_cfg=None)
            self.new_op_nodes.append(new_op_node)
            self.resample_op_nodes.append(op_node)
            self.channels_nodes.append(out_channels)
            self.stride_nodes.append(new_node_stride)

    #@auto_fp16()
    def forward(self, inputs):
        assert len(inputs) == self.num_outs
        feats = [i for i in inputs]
        for fnode, op_node, new_op_node in zip(self.nodes_settings,
                                               self.resample_op_nodes, self.new_op_nodes):
            input_node = []
            for input_offset, resample_op in zip(fnode['inputs_offsets'], op_node):
                input_node.append(resample_op(feats[input_offset]))
            feats.append(new_op_node(input_node))
            # add hist

        outputs = feats[-self.num_outs:]
        return outputs

@NECKS.register_module()
class BiFPN(nn.Module):

    def __init__(self,
                 in_channels,
                 out_channels,
                 num_outs,
                 strides=[8, 16, 32, 64, 128],
                 start_level=0,
                 end_level=-1,
                 stack=3,
                 conv_cfg=None,
                 add_extra_convs=False,
                 norm_cfg=None,
                 act_cfg=None,
                 init_cfg=dict(
                     type='Xavier', layer='Conv2d', distribution='uniform')):
        super(BiFPN, self).__init__()
        assert len(in_channels) >= 3
        assert len(strides) == len(in_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.strides = strides
        self.num_ins = len(in_channels)
        self.act_cfg = act_cfg
        self.stack = stack
        self.num_outs = num_outs
        self.fp16_enabled = False
        # self.writer = SummaryWriter()

        if end_level == -1:
            self.backbone_end_level = self.num_ins
            assert num_outs >= self.num_ins - start_level
        else:
            # if end_level < inputs, no extra level is allowed
            self.backbone_end_level = end_level
            assert end_level <= len(in_channels)
            assert num_outs == end_level - start_level
        self.start_level = start_level
        self.end_level = end_level

        # add extra conv layers (e.g., RetinaNet)
        bifpn_in_channels = in_channels[self.start_level:self.backbone_end_level]
        bifpn_strides = strides[self.start_level:self.backbone_end_level]
        bifpn_num_outs = self.num_outs
        extra_levels = num_outs - self.backbone_end_level + self.start_level
        self.extra_convs = None
        if extra_levels >= 1:
            self.extra_convs = nn.ModuleList()
            for _ in range(extra_levels):
                self.extra_convs.append(
                    ResampingConv(
                        bifpn_in_channels[-1],
                        bifpn_strides[-1],
                        bifpn_strides[-1] * 2,
                        out_channels,
                        norm_cfg=norm_cfg))
                bifpn_in_channels.append(out_channels)
                bifpn_strides.append(bifpn_strides[-1] * 2)

        self.stack_bifpns = nn.ModuleList()
        for _ in range(stack):
            self.stack_bifpns.append(
                bifpn(
                    bifpn_in_channels,
                    out_channels,
                    strides=bifpn_strides,
                    num_outs=bifpn_num_outs,
                    conv_cfg=None,
                    norm_cfg=norm_cfg,
                    act_cfg=None))
            # import pdb; pdb.set_trace()
            bifpn_in_channels = [out_channels for _ in range(bifpn_num_outs)]

    @auto_fp16()
    def forward(self, inputs):
        assert len(inputs) == len(self.in_channels)
        feats = list(inputs[self.start_level:self.backbone_end_level])
        if self.extra_convs:
            for i in range(len(self.extra_convs)):
                feats.append(self.extra_convs[i](feats[-1]))
        for idx, stack_bifpn in enumerate(self.stack_bifpns):
            feats = stack_bifpn(feats)
        # return tuple(x)
        return tuple(feats[:self.num_outs])

    def init_weights(self, pretrained=None):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                xavier_init(m, distribution='uniform')
            elif isinstance(m, (_BatchNorm, nn.GroupNorm, nn.SyncBatchNorm)):
                constant_init(m, 1)

You may add addition that may be helpful for locating the problem, such as I installed Pytorch using :
```
pip install torch==1.10+cu113 
```

Error traceback

Traceback (most recent call last):
  File "D:\Leon\SpikeDetection\Wheat\tools\train.py", line 237, in <module>
    main()
  File "D:\Leon\SpikeDetection\Wheat\tools\train.py", line 226, in main
    train_detector(
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\apis\train.py", line 244, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 248, in train_step
    losses = self(**data)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\single_stage.py", line 82, in forward_train
    x = self.extract_feat(img)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\single_stage.py", line 45, in extract_feat
    x = self.neck(x)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 383, in forward
    feats = stack_bifpn(feats)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 293, in forward
    input_node.append(resample_op(feats[input_offset]))
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 223, in forward
    x = self.rescale_op(x) if self.rescale_op else x
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\functional.py", line 3712, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)

Bug fix I think the reason might be because of the torch._C._nn.upsample_nearest2d function but I don't know for sure. Any help to solve this would be much appreciated.

BIGWangYuDong commented 2 years ago

Is a CUDA oom error, you can try to use a smaller input size or change bs=1. If it works, you can than try to use fp16 to train your models

MABatin commented 2 years ago

I've tried resizing to (320, 320) and batch_size=1 but now it gives me this error.

Traceback (most recent call last):
  File "D:\Leon\SpikeDetection\Wheat\tools\train.py", line 237, in <module>
    main()
  File "D:\Leon\SpikeDetection\Wheat\tools\train.py", line 226, in main
    train_detector(
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\apis\train.py", line 244, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 248, in train_step
    losses = self(**data)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\single_stage.py", line 82, in forward_train
    x = self.extract_feat(img)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\detectors\single_stage.py", line 45, in extract_feat
    x = self.neck(x)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 383, in forward
    feats = stack_bifpn(feats)
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 294, in forward
    feats.append(new_op_node(input_node))
  File "C:\Users\leonh\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\leonh\anaconda3\lib\site-packages\mmdet\models\necks\bifpn.py", line 156, in forward
    x += w[i] * inputs[i]
RuntimeError: The size of tensor a (30) must match the size of tensor b (32) at non-singleton dimension 2

BIGWangYuDong commented 2 years ago

tensor size did not match in bifpn, you need to check your ccode

MABatin commented 2 years ago

I'm trying to figure out but can't seem to get my head around it. When I separately try to pass input size of (640, 640) or (320,320) to the BiFPN module I get results but when using it in the config file like I did there, I get CUDA oom for input size (640, 640) and tensor size mismatch for (320, 320). Now I understand that the CUDA oom might be because of the ResamplingConv class but all it does is rescale the last input layer using MaxPool2d which shouldn't be so memory taxing but I could be wrong. Then for input size (320, 320) I don't understand why there would be size mismatch. I've tried to figure it out for hours but I am not so good at figuring this out.

open-mmlab / mmdetection

CUDA out of memory #7963