open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.45k stars 1.07k forks source link

[Bug] cpu not fully used, data_time load slow #1927

Open YAwei666 opened 3 months ago

YAwei666 commented 3 months ago

Branch

main branch (mmpretrain version)

Describe the bug

python tools/train.py configs/resnet/resnet50_8xb32_in1k_2.py base = [ '../base/models/resnet50.py', '../base/datasets/imagenet_bs32.py', '../base/schedules/imagenet_bs256_coslr.py', '../base/default_runtime.py' ] model = dict( backbone=dict( frozen_stages=2, init_cfg=dict( type='Pretrained', checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth', prefix='backbone', )), head=dict(num_classes=5), )

>>>>>>>>>>>>>>> 在这里重载数据配置 >>>>>>>>>>>>>>>>>>>

data_root = '/mnt/data//dataset' train_dataloader = dict( batch_size=192, dataset=dict( type='CustomDataset', data_root=data_root, ann_file='meta/train.txt', # 我们假定使用子文件夹格式,因此需要将标注文件置空 data_prefix='', )) val_dataloader = dict( batch_size=192, dataset=dict( type='CustomDataset', data_root=data_root, ann_file='meta/test.txt', # 我们假定使用子文件夹格式,因此需要将标注文件置空 data_prefix='', )) test_dataloader = val_dataloader

optim_wrapper = dict( optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))

学习率策略

param_scheduler = dict(

type='MultiStepLR', by_epoch=True, milestones=[15], gamma=0.1)

train, val, test setting

train_cfg = dict(by_epoch=True, max_epochs=30, val_interval=1)

'../base/models/resnet50.py'

model settings

model = dict( type='ImageClassifier', backbone=dict( type='ResNeSt', depth=50, num_stages=4, out_indices=(3, ), style='pytorch'), neck=dict(type='GlobalAveragePooling'), head=dict( type='LinearClsHead', num_classes=1000, in_channels=2048, loss=dict( type='LabelSmoothLoss', label_smooth_val=0.1, num_classes=1000, reduction='mean', loss_weight=1.0), topk=(1, 5), cal_acc=False), train_cfg=dict(augments=dict(type='Mixup', alpha=0.2)), )

'../base/datasets/imagenet_bs32.py'

dataset settings

dataset_type = 'ImageNet' data_preprocessor = dict( num_classes=1000,

RGB format normalization parameters

mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
# convert image from BGR to RGB
to_rgb=True,

)

train_pipeline = [ dict(type='LoadImageFromFile',imdecode_backend='pillow' ), dict(type='RandomResizedCrop', scale=224), dict(type='RandomFlip', prob=0.5, direction='horizontal'), dict(type='PackInputs'), ]

test_pipeline = [ dict(type='LoadImageFromFile',imdecode_backend='pillow'), dict(type='ResizeEdge', scale=256, edge='short'), dict(type='CenterCrop', crop_size=224), dict(type='PackInputs'), ]

train_dataloader = dict( batch_size=128, num_workers=12, dataset=dict( type=dataset_type, data_root='data/imagenet', pipeline=train_pipeline), sampler=dict(type='DefaultSampler', shuffle=True), )

val_dataloader = dict( batch_size=128, num_workers=12, dataset=dict( type=dataset_type, data_root='data/imagenet', pipeline=test_pipeline), sampler=dict(type='DefaultSampler', shuffle=False), ) val_evaluator = dict(type='Accuracy', topk=(1))

If you want standard test, please manually configure the test dataset

test_dataloader = val_dataloader test_evaluator = val_evaluator

'../base/schedules/imagenet_bs256_coslr.py',

optimizer

optim_wrapper = dict( optimizer=dict(type='SGD', lr=0.8, momentum=0.9, weight_decay=5e-5))

learning policy

param_scheduler = [ dict(type='LinearLR', start_factor=0.1, by_epoch=True, begin=0, end=5), dict(type='CosineAnnealingLR', T_max=95, by_epoch=True, begin=5, end=100) ]

train, val, test setting

train_cfg = dict(by_epoch=True, max_epochs=100, val_interval=1) val_cfg = dict() test_cfg = dict()

NOTE: auto_scale_lr is for automatically scaling LR,

based on the actual training batch size.

auto_scale_lr = dict(base_batch_size=1024)

lr: 1.0000e-02 eta: 17:43:57 time: 3.5791 data_time: 3.3401 memory: 4676 loss: 0.3175 Screenshot from 2024-08-07 23-18-56

Environment

{'sys.platform': 'linux', 'Python': '3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]', 'CUDA available': True, 'MUSA available': False, 'numpy_random_seed': 2147483648, 'GPU 0': 'NVIDIA GeForce RTX 3090', 'CUDA_HOME': ':/usr/local/cuda', 'GCC': 'gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609', 'PyTorch': '1.10.1', 'TorchVision': '0.11.2', 'OpenCV': '4.10.0', 'MMEngine': '0.10.4', 'MMCV': '2.2.0', 'MMPreTrain': '1.2.0+'}

Other information

No response

YAwei666 commented 3 months ago

it seems only 2 cpu kernels works. 怎么回事呢

liuwake commented 2 months ago

This is strange. Based on your information, it can be seen that you have successfully started 12 num_workers . However, only two CPU threads are occupied. Does your server have virtualization technology enabled, which may result in you only being able to use two CPU threads