Closed Tianlock closed 4 years ago
The memory in the log are different from NVIDIA-SMI memory usage.
Hi @Tianlock , Could you provide more details about your training? For example, do you use slurm_train.sh or dist_train.sh? Which config you are using? We do not have 2080Ti for now, but we will try to use the same config on 1080Ti to see whether we will meet the same problem.
Hi @Tianlock , Could you provide more details about your training? For example, do you use slurm_train.sh or dist_train.sh? Which config you are using? We do not have 2080Ti for now, but we will try to use the same config on 1080Ti to see whether we will meet the same problem.
hello, i use dist_train.sh, and i use faster_rcnn_ohem_r50_fpn_1x.py config.
@ZwwWayne hello, i use dist_train.sh, and i use faster_rcnn_ohem_r50_fpn_1x.py config.
And i find another another interesting thing, the memory in log trained with fasterRCNN r50 is 10098 and the memory in log trained with fasterRCNN is 10056. The memory cost in r50 is same to r101.
OK,and can you try faster_rcnn_r50_fpn_1x.py to see whether the memory usage is normal? On 1080Ti, faster_rcnn_r50_fpn_1x.py can run with a batch size of 4, so if you can do that too, then we may be able to locate the bug, which should be in OHEM.
OK,and can you try faster_rcnn_r50_fpn_1x.py to see whether the memory usage is normal? On 1080Ti, faster_rcnn_r50_fpn_1x.py can run with a batch size of 4, so if you can do that too, then we may be able to locate the bug, which should be in OHEM.
@ZwwWayne I have tried faster_rcnn_r50_fpn_1x.py, 2 imgs per gpu, and the usage is about 4-5GB in 7gpus and 11gb in 1 gpu. I tried several times, if 2imgs per gpu, if have 50% oom and if 4 imgs per gpu, it will oom 100%. The oom always occurs on 1 gpu
@ZwwWayne Sometimes i even can't train faster_rcnn_r50_fpn_1x 2 imgs per gpu and the img scale is 800:1000. It's very strange to oom.
@ZwwWayne And i have tried to convert gt to cpu. It can't be the reason that too many gts
OK, thanks for your report, I am trying to reproduce this phenomenon on my machine.
Hi @Tianlock , What pre-trained model are you using? Is it from torchvision or from mmcv? Sometimes the pretrain models are GPU models, which cost extra GPU memory when loading them.
Hi @Tianlock , What pre-trained model are you using? Is it from torchvision or from mmcv? Sometimes the pretrain models are GPU models, which cost extra GPU memory when loading them.
@ZwwWayne i use pre-trained model from torchvision, is it the reason of oom?
You can have a check for that.
You can have a check for that.
hello, @ZwwWayne i have tried to use pre-trained model from open-mmlab, it's also occur oom error. And the report is : RuntimeError: CUDA out of memory. Tried to allocate 8.27 GiB (GPU 0; 10.76 GiB total capacity; 1.61 GiB already allocated; 8.27 GiB free; 1.63 GiB reserved in total by PyTorch)
You can have a check for that.
hello, @ZwwWayne i have tried to use pre-trained model from open-mmlab, it's also occur oom error. And the report is : RuntimeError: CUDA out of memory. Tried to allocate 8.27 GiB (GPU 0; 10.76 GiB total capacity; 1.61 GiB already allocated; 8.27 GiB free; 1.63 GiB reserved in total by PyTorch)
And i use faster_rcnn_r101_fpn_1x.py config, imgs scale 800:1000, 2 imgs per gpu. I even can't train the model 1 iter.
Hi @Tianlock , Could you provide config to reproduce the OOM ?
Hi @Tianlock , Could you provide config to reproduce the OOM ?
I met the same OOM problem with my four 1080Ti GPUs. Here is my config, only 2 imgs per gpu
# fp16 settings
fp16 = dict(loss_scale=512.)
norm_cfg = dict(type='SyncBN', requires_grad=True)
# model settings
model = dict(
type='FasterRCNN',
# pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
norm_cfg=norm_cfg,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='SharedFCBBoxHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=2,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))
# model training and testing settings
train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False))
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=1000,
nms_post=1000,
max_num=1000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
)
# dataset settings
dataset_type = 'WIDERFaceDataset'
data_root = 'data/WIDERFace/'
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
imgs_per_gpu=2,
workers_per_gpu=0,
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
ann_file=data_root + 'train.txt',
img_prefix=data_root + 'WIDER_train/',
# min_size=17,
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=data_root + 'val.txt',
img_prefix=data_root + 'WIDER_val/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'val.txt',
img_prefix=data_root + 'WIDER_val/',
pipeline=test_pipeline))
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=2)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/wider_face_faster_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]
Thanks!
@ZwwWayne @hellock I'm sure it's a bug, b\c it still occurs OOM problem even if imgs per gpu = 1
@yhcao6 Please have a check if dist_train.sh
works as expected.
Sry I can't reproduce your error.
I try to run faster_rcnn_r50 with command
./tools/dist_train.sh configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 2
,
here is the screen shot of nvidia-smi
:
.
Hi @Tianlock , Could you provide config to reproduce the OOM ?
I met the same OOM problem with my four 1080Ti GPUs. Here is my config, only 2 imgs per gpu
# fp16 settings fp16 = dict(loss_scale=512.) norm_cfg = dict(type='SyncBN', requires_grad=True) # model settings model = dict( type='FasterRCNN', # pretrained='torchvision://resnet50', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, norm_cfg=norm_cfg, norm_eval=False, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, norm_cfg=norm_cfg, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_scales=[8], anchor_ratios=[0.5, 1.0, 2.0], anchor_strides=[4, 8, 16, 32, 64], target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0], loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=2, target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2], reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))) # model training and testing settings train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100) # soft-nms is also supported for rcnn testing # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05) ) # dataset settings dataset_type = 'WIDERFaceDataset' data_root = 'data/WIDERFace/' img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( imgs_per_gpu=2, workers_per_gpu=0, train=dict( type='RepeatDataset', times=2, dataset=dict( type=dataset_type, ann_file=data_root + 'train.txt', img_prefix=data_root + 'WIDER_train/', # min_size=17, pipeline=train_pipeline)), val=dict( type=dataset_type, ann_file=data_root + 'val.txt', img_prefix=data_root + 'WIDER_val/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'val.txt', img_prefix=data_root + 'WIDER_val/', pipeline=test_pipeline)) # optimizer optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) # learning policy lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) checkpoint_config = dict(interval=2) # yapf:disable log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), # dict(type='TensorboardLoggerHook') ]) # yapf:enable # runtime settings total_epochs = 12 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/wider_face_faster_rcnn_r50_fpn_1x' load_from = None resume_from = None workflow = [('train', 1)]
Thanks!
Hi, @YAOYI626, I cant run your config in the latest mmdet, could you provide a config that is consistent with the latest branch? I notice that you are using fp16 training. I just try faster_rcnn_r50_fp16 while it is normal. Here is the scrrenshot: Could you also have a try to see if it will cause OOM?
@yhcao6 thanks for reply!
I have to say sorry 'cause I just found the bug is about WiderFace dataset. It seems too much small bboxes in the batch if we don't limit the min size of bboxes, which causes OOM when calculating IoU. It works well now and no imbalanced problem for me now.
Hello, i met a question when i train fasterRCNN. Some training config: 8 2080Ti , 2 imgs per gpu, and 600:800 img scale. I find 1gpu may use 11Gb memory, and others use 5Gb. I think the gpus are not used fully。But if i increase the number of imgs per gpu to 4 or increase the img scale. it will occur OOM error on 1 gpu. However other 7 gpus still have enough memory to use. Could u help me this question. Thanks