open-mmlab / mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
https://mmyolo.readthedocs.io/zh_CN/dev/
GNU General Public License v3.0
2.93k stars 532 forks source link

训练内存及显存会持续增张 #884

Open nobody-cheng opened 1 year ago

nobody-cheng commented 1 year ago

Prerequisite

🐞 Describe the bug

训练过程中内存及显存会一直涨,直到溢出出错

Environment

System environment: sys.platform: linux Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1556182265 GPU 0,1,2,3: NVIDIA GeForce RTX 4090 CUDA_HOME: /usr/local/cuda-12.1 NVCC: Cuda compilation tools, release 12.1, V12.1.105 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.13.1+cu116 PyTorch compiling details: PyTorch built with:

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1556182265 Distributed launcher: none Distributed training: False GPU number: 1

base = ['../base/default_runtime.py', '../base/det_p5_tta.py'] data_root = './data/CrowdHuman2coco/' dataset_type = 'YOLOv5CocoDataset'

class_name = ('head', 'person', ) num_classes = len(class_name) metainfo = dict(classes=class_name, palette=[(220, 20, 60), (220, 100, 128)])

img_scale = (640, 640)
deepen_factor = 0.33 widen_factor = 0.5

num_last_epochs = 5 img_scales = [ ( 640, 640, ), ( 320, 320, ), ] max_epochs = 80 save_epoch_intervals = 5 train_batch_size_per_gpu = 12 train_num_workers = 2 val_batch_size_per_gpu = 1 val_num_workers = 2

load_from = 'https://download.openmmlab.com/mmyolo/v0/ppyoloe/ppyoloe_pretrain/ppyoloe_plus_s_obj365_pretrained-bcfe8478.pth'

persistent_workers = True base_lr = 0.001

strides = [8, 16, 32]

model = dict( type='YOLODetector', data_preprocessor=dict(

use this to support multi_scale training

    type='PPYOLOEDetDataPreprocessor',
    pad_size_divisor=32,
    batch_augments=[
        dict(
            type='PPYOLOEBatchRandomResize',
            random_size_range=(320, 800),
            interval=1,
            size_divisor=32,
            random_interp=True,
            keep_ratio=False)
    ],
    mean=[0., 0., 0.],
    std=[255., 255., 255.],
    bgr_to_rgb=True),
backbone=dict(
    type='PPYOLOECSPResNet',
    deepen_factor=deepen_factor,
    widen_factor=widen_factor,
    block_cfg=dict(
        type='PPYOLOEBasicBlock', shortcut=True, use_alpha=True),
    norm_cfg=dict(type='BN', momentum=0.1, eps=1e-5),
    act_cfg=dict(type='SiLU', inplace=True),
    attention_cfg=dict(
        type='EffectiveSELayer', act_cfg=dict(type='HSigmoid')),
    use_large_stem=True),
neck=dict(
    type='PPYOLOECSPPAFPN',
    in_channels=[256, 512, 1024],
    out_channels=[192, 384, 768],
    deepen_factor=deepen_factor,
    widen_factor=widen_factor,
    num_csplayer=1,
    num_blocks_per_layer=3,
    block_cfg=dict(
        type='PPYOLOEBasicBlock', shortcut=False, use_alpha=False),
    norm_cfg=dict(type='BN', momentum=0.1, eps=1e-5),
    act_cfg=dict(type='SiLU', inplace=True),
    drop_block_cfg=None,
    use_spp=True),
bbox_head=dict(
    type='PPYOLOEHead',
    head_module=dict(
        type='PPYOLOEHeadModule',
        num_classes=num_classes,
        in_channels=[192, 384, 768],
        widen_factor=widen_factor,
        featmap_strides=strides,
        reg_max=16,
        norm_cfg=dict(type='BN', momentum=0.1, eps=1e-5),
        act_cfg=dict(type='SiLU', inplace=True),
        num_base_priors=1),
    prior_generator=dict(
        type='mmdet.MlvlPointGenerator', offset=0.5, strides=strides),
    bbox_coder=dict(type='DistancePointBBoxCoder'),
    loss_cls=dict(
        type='mmdet.VarifocalLoss',
        use_sigmoid=True,
        alpha=0.75,
        gamma=2.0,
        iou_weighted=True,
        reduction='sum',
        loss_weight=1.0),
    loss_bbox=dict(
        type='IoULoss',
        iou_mode='giou',
        bbox_format='xyxy',
        reduction='mean',
        loss_weight=2.5,
        return_iou=False),
    loss_dfl=dict(
        type='mmdet.DistributionFocalLoss',
        reduction='mean',
        loss_weight=0.5 / 4)),
train_cfg=dict(
    initial_epoch=30,
    initial_assigner=dict(
        type='BatchATSSAssigner',
        num_classes=num_classes,
        topk=9,
        iou_calculator=dict(type='mmdet.BboxOverlaps2D')),
    assigner=dict(
        type='BatchTaskAlignedAssigner',
        num_classes=num_classes,
        topk=13,
        alpha=1,
        beta=6,
        eps=1e-9)),
test_cfg=dict(
    multi_label=True,
    nms_pre=1000,
    score_thr=0.01,
    nms=dict(type='nms', iou_threshold=0.7),
    max_per_img=300))

train_pipeline = [ dict(type='LoadImageFromFile', backend_args=base.backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='PPYOLOERandomDistort'), dict(type='mmdet.Expand', mean=(103.53, 116.28, 123.675)), dict(type='PPYOLOERandomCrop'), dict(type='mmdet.RandomFlip', prob=0.5), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction')) ]

train_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, persistent_workers=persistent_workers, pin_memory=True, sampler=dict(type='DefaultSampler', shuffle=True), collate_fn=dict(type='yolov5_collate', use_ms_training=True), dataset=dict( type=dataset_type, data_root=data_root, metainfo=metainfo, ann_file='annotations/train.json', data_prefix=dict(img='train/'), filter_cfg=dict(filter_empty_gt=True, min_size=0), pipeline=train_pipeline))

test_pipeline = [ dict(type='LoadImageFromFile', backend_args=base.backend_args), dict( type='mmdet.FixShapeResize', width=img_scale[0], height=img_scale[1], keep_ratio=False, interpolation='bicubic'), dict(type='LoadAnnotations', with_bbox=True, scope='mmdet'), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ]

val_dataloader = dict( batch_size=val_batch_size_per_gpu, num_workers=val_num_workers, persistent_workers=persistent_workers, pin_memory=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, metainfo=metainfo, test_mode=True, data_prefix=dict(img='val/'), filter_cfg=dict(filter_empty_gt=True, min_size=0), ann_file='annotations/val.json', pipeline=test_pipeline))

test_dataloader = val_dataloader

param_scheduler = None optim_wrapper = dict( type='OptimWrapper', optimizer=dict( type='SGD', lr=base_lr, momentum=0.9, weight_decay=5e-4, nesterov=False), paramwise_cfg=dict(norm_decay_mult=0.))

default_hooks = dict( param_scheduler=dict( type='PPYOLOEParamSchedulerHook', warmup_min_iter=1000, start_factor=0., warmup_epochs=5, min_lr_ratio=0.0, total_epochs=int(max_epochs * 1.2)), checkpoint=dict( type='CheckpointHook', interval=save_epoch_intervals, save_best='auto', max_keep_ckpts=3))

custom_hooks = [ dict( type='EMAHook', ema_type='ExpMomentumEMA', momentum=0.0002, update_buffers=True, strict_load=False, priority=49) ]

val_evaluator = dict( type='mmdet.CocoMetric', proposal_nums=(100, 1, 10), ann_file=data_root + 'annotations/val.json', metric='bbox') test_evaluator = val_evaluator

train_cfg = dict( type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=save_epoch_intervals) val_cfg = dict(type='ValLoop') test_cfg = dict(type='TestLoop')

wangg12 commented 1 year ago

similar problem here. Have you resolved it? @nobody-cheng

LakeOcean commented 11 months ago

遇到同样的问题

Learnerner commented 11 months ago

我也遇到相同问题 mmdetection 和 mmyolo都有相同的问题

HRliao1109 commented 9 months ago

I have similar promblem when I use mmcv and mmdetection in other project. It might cause from mmdetection.

Baboom-l commented 7 months ago

有人解决了吗

moke-harry commented 7 months ago

同样的问题,但是我的GPU没有被调用,内存占用持续增大,我参考v5对比模型的定义,模型定义好像没有出现问题,猜测应该是其他模块的问题