open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.83k stars 540 forks source link

Cuda out of memory but the free cuda memory is bigger than pytorch reserved #777

Open shmilyzxw opened 1 year ago

shmilyzxw commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

sys.platform: win32
Python: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.74 MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.35.32215 版 GCC: n/a PyTorch: 1.8.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.9.0+cu111 OpenCV: 4.7.0 MMCV: 1.7.1 MMCV Compiler: MSVC 192829924 MMCV CUDA Compiler: 11.1 MMRotate: 0.3.4+7755aa5

Reproduces the problem - code sample

base = [ '../base/datasets/dotav1.py', '../base/schedules/schedule_1x.py', '../base/default_runtime.py' ]

angle_version = 'le90' model = dict( type='RoITransformer', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RotatedRPNHead', in_channels=256, feat_channels=256, version=angle_version, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), roi_head=dict( type='RoITransRoIHead', version=angle_version, num_stages=2, stage_loss_weights=[1, 1], bbox_roi_extractor=[ dict( type='SingleRoIExtractor', roi_layer=dict( type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), dict( type='RotatedSingleRoIExtractor', roi_layer=dict( type='RoIAlignRotated', out_size=7, sample_num=2, clockwise=True), out_channels=256, featmap_strides=[4, 8, 16, 32]), ], bbox_head=[ dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, bbox_coder=dict( type='DeltaXYWHAHBBoxCoder', angle_range=angle_version, norm_factor=2, edge_swap=True, target_means=[0., 0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2, 1]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, bbox_coder=dict( type='DeltaXYWHAOBBoxCoder', angle_range=angle_version, norm_factor=None, edge_swap=True, proj_xy=True, target_means=[0., 0., 0., 0., 0.], target_stds=[0.05, 0.05, 0.1, 0.1, 0.5]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) ]),

model training and testing settings

train_cfg=dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_pre=2000,
        max_per_img=2000,
        nms=dict(type='nms', iou_threshold=0.7),
        min_bbox_size=0),
    rcnn=[
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='BboxOverlaps2D')),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D')),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ]),
test_cfg=dict(
    rpn=dict(
        nms_pre=2000,
        max_per_img=2000,
        nms=dict(type='nms', iou_threshold=0.7),
        min_bbox_size=0),
    rcnn=dict(
        nms_pre=2000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type=angle_version, iou_thr=0.1),
        max_per_img=2000)))

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version=angle_version), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] data = dict( train=dict(pipeline=train_pipeline, version=angle_version), val=dict(version=angle_version), test=dict(version=angle_version))

Reproduces the problem - command or script

python tools/train.py configs/roi_trans/roi_trans_r50_fpn_1x_dota_le90.py

Reproduces the problem - error message

Traceback (most recent call last): File "tools/train.py", line 192, in main() File "tools/train.py", line 181, in main train_detector( File "C:\ProgramData\Anaconda3\lib\site-packages\mmrotate\apis\train.py", line 141, in train_detector runner.run(data_loaders, cfg.workflow) File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 31, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\parallel\data_parallel.py", line 77, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 248, in train_step losses = self(data) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 119, in new_func return old_func(args, kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\models\detectors\base.py", line 172, in forward return self.forward_train(img, img_metas, kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmrotate\models\detectors\two_stage.py", line 135, in forward_train rpn_losses, proposal_list = self.rpn_head.forward_train( File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\models\dense_heads\base_dense_head.py", line 335, in forward_train losses = self.loss(loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "C:\ProgramData\Anaconda3\lib\site-packages\mmcv\runner\fp16_utils.py", line 208, in new_func return old_func(args, *kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\mmrotate\models\dense_heads\rotated_rpn_head.py", line 337, in loss cls_reg_targets = self.get_targets( File "C:\ProgramData\Anaconda3\lib\site-packages\mmrotate\models\dense_heads\rotated_rpn_head.py", line 218, in get_targets results = multi_apply( File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\core\utils\misc.py", line 30, in multi_apply return tuple(map(list, zip(map_results))) File "C:\ProgramData\Anaconda3\lib\site-packages\mmrotate\models\dense_heads\rotated_rpn_head.py", line 100, in _get_targets_single assign_result = self.assigner.assign( File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\core\bbox\assigners\max_iou_assigner.py", line 111, in assign overlaps = self.iou_calculator(gt_bboxes, bboxes) File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\core\bbox\iou_calculators\iou2d_calculator.py", line 65, in call return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned) File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\core\bbox\iou_calculators\iou2d_calculator.py", line 237, in bbox_overlaps wh = fp16_clamp(rb - lt, min=0) File "C:\ProgramData\Anaconda3\lib\site-packages\mmdet\core\bbox\iou_calculators\iou2d_calculator.py", line 19, in fp16_clamp return x.clamp(min, max) RuntimeError: CUDA out of memory. Tried to allocate 1.11 GiB (GPU 0; 24.00 GiB total capacity; 5.16 GiB already allocated; 16.26 GiB free; 5.21 GiB reserved in total by PyTorch)

Additional information

When I resize images to 600×600 resolution,this issue can be solved.But I want to train the model with 1024×1024 resolution.

zytx121 commented 1 year ago

Hi @shmilyzxw, Is there anyone else using this GPU? How many is your batch size?

BTW, the maintenance of the old version has stopped. Welcome to the new version: https://github.com/open-mmlab/mmrotate/tree/1.x

shmilyzxw commented 1 year ago

Hi @shmilyzxw, Is there anyone else using this GPU? How many is your batch size?

BTW, the maintenance of the old version has stopped. Welcome to the new version: https://github.com/open-mmlab/mmrotate/tree/1.x

Im using this GPU alone. And my setting is 'samples_per_gpu=1' and 'workers_per_gpu=1', I wonder that is batchsize set by changing these two parameters? I will try again by using the new version code,thanks!

19990101lrk commented 1 year ago

Hi @shmilyzxw, Is there anyone else using this GPU? How many is your batch size?

BTW, the maintenance of the old version has stopped. Welcome to the new version: https://github.com/open-mmlab/mmrotate/tree/1.x

How to migrate from version 0.x to version 1.x?

shmilyzxw commented 1 year ago

Hi @shmilyzxw, Is there anyone else using this GPU? How many is your batch size? BTW, the maintenance of the old version has stopped. Welcome to the new version: https://github.com/open-mmlab/mmrotate/tree/1.x

How to migrate from version 0.x to version 1.x?

Just git the version 1.x code and install by following the readme.md

ououch123 commented 5 months ago

@shmilyzxw ,Have you solved your problem? I had the same problem.