Closed mrFocusXin closed 1 year ago
what is your gpu to run MAE? And did you modified your configs? You could provide your config here.
My config as follow:
base = [ '../base/models/mae_vit-base-p16.py',
'../_base_/schedules/adamw_coslr-200e_in1k.py',
'../_base_/default_runtime.py',
]
#
dataset_type = 'mmcls.ImageNet' data_root = '/home/wangxin/mmselfsup_1.x/data/imagenet/' file_client_args = dict(backend='disk')
train_pipeline = [ dict(type='LoadImageFromFile', file_client_args=file_client_args), dict( type='RandomResizedCrop', size=16, #224, scale=(0.2, 1.0), backend='pillow', interpolation='bicubic'), dict(type='RandomFlip', prob=0.5), dict(type='PackSelfSupInputs', meta_keys=['img_path']) ]
train_dataloader = dict( batch_size=16,#128, num_workers=4,#8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), collate_fn=dict(type='default_collate'), dataset=dict( type=dataset_type, data_root=data_root, ann_file='meta/train.txt', data_prefix=dict(img_path='train/'), pipeline=train_pipeline))
#
train_dataloader = dict(batch_size=16, num_workers=4)
optimizer = dict( type='AdamW', lr=1.5e-4 * 4096 / 256, betas=(0.9, 0.95), weight_decay=0.05) optim_wrapper = dict( type='OptimWrapper', optimizer=optimizer, paramwise_cfg=dict( custom_keys={ 'ln': dict(decay_mult=0.0), 'bias': dict(decay_mult=0.0), 'pos_embed': dict(decay_mult=0.), 'mask_token': dict(decay_mult=0.), 'cls_token': dict(decay_mult=0.) }))
param_scheduler = [ dict( type='LinearLR', start_factor=1e-4, by_epoch=True, begin=0, end=40, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=360, by_epoch=True, begin=40, end=400, convert_to_iter_based=True) ]
train_cfg = dict(max_epochs=1) default_hooks = dict( logger=dict(type='LoggerHook', interval=100),
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True) resume = True
Just now I found that my cuda was unavailable through log output, so I replaced it with torch==1.10.0+cu111, but got an error as follows
/home/wangxin/anaconda3/envs/mmselfsup/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py:33: UserWarning: Fail to import MultiScaleDeformableAttention
from mmcv.ops.multi_scale_deform_attn
, You should install mmcv
rather than mmcv-lite
if you need this module.
warnings.warn('Fail to import MultiScaleDeformableAttention
from '
02/23 08:36:03 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.8.15 (default, Nov 11 2022, 14:08:18) [GCC 11.2.0] CUDA available: True numpy_random_seed: 301832789 GPU 0,1,2,3: Tesla V100-SXM2-32GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.64 GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu111 OpenCV: 4.7.0 MMEngine: 0.5.0
02/23 08:36:05 - mmengine - INFO - Config: model = dict( type='SimCLR', data_preprocessor=dict( mean=(123.675, 116.28, 103.53), std=(58.395, 57.12, 57.375), bgr_to_rgb=True), backbone=dict( type='ResNet', depth=50, in_channels=3, out_indices=[4], norm_cfg=dict(type='SyncBN'), zero_init_residual=True), neck=dict( type='NonLinearNeck', in_channels=2048, hid_channels=2048, out_channels=128, num_layers=2, with_avg_pool=True), head=dict( type='ContrastiveHead', loss=dict(type='mmcls.CrossEntropyLoss'), temperature=0.1)) optimizer = dict(type='LARS', lr=0.3, weight_decay=1e-06, momentum=0.9) optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='LARS', lr=0.3, weight_decay=1e-06, momentum=0.9), paramwise_cfg=dict( custom_keys=dict({ 'bn': dict(decay_mult=0, lars_exclude=True), 'bias': dict(decay_mult=0, lars_exclude=True), 'downsample.1': dict(decay_mult=0, lars_exclude=True) }))) param_scheduler = [ dict( type='LinearLR', start_factor=0.0001, by_epoch=True, begin=0, end=10, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=190, by_epoch=True, begin=10, end=200) ] train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=200) default_scope = 'mmselfsup' default_hooks = dict( runtime_info=dict(type='RuntimeInfoHook'), timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=50), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3), sampler_seed=dict(type='DistSamplerSeedHook')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl')) log_processor = dict( window_size=10, custom_cfg=[dict(data_src='', method='mean', window_size='global')]) vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='SelfSupVisualizer', vis_backends=[dict(type='LocalVisBackend')], name='visualizer') log_level = 'INFO' load_from = None resume = False dataset_type = 'mmcls.ImageNet' data_root = '/home/wangxin/mmselfsup_1.x/data/imagenet/' file_client_args = dict(backend='disk') view_pipeline = [ dict(type='RandomResizedCrop', size=224, backend='pillow'), dict(type='RandomFlip', prob=0.5), dict( type='RandomApply', transforms=[ dict( type='ColorJitter', brightness=0.8, contrast=0.8, saturation=0.8, hue=0.2) ], prob=0.8), dict( type='RandomGrayscale', prob=0.2, keep_channels=True, channel_weights=(0.114, 0.587, 0.2989)), dict(type='RandomGaussianBlur', sigma_min=0.1, sigma_max=2.0, prob=0.5) ] train_pipeline = [ dict(type='LoadImageFromFile', file_client_args=dict(backend='disk')), dict( type='MultiView', num_views=2, transforms=[[{ 'type': 'RandomResizedCrop', 'size': 224, 'backend': 'pillow' }, { 'type': 'RandomFlip', 'prob': 0.5 }, { 'type': 'RandomApply', 'transforms': [{ 'type': 'ColorJitter', 'brightness': 0.8, 'contrast': 0.8, 'saturation': 0.8, 'hue': 0.2 }], 'prob': 0.8 }, { 'type': 'RandomGrayscale', 'prob': 0.2, 'keep_channels': True, 'channel_weights': (0.114, 0.587, 0.2989) }, { 'type': 'RandomGaussianBlur', 'sigma_min': 0.1, 'sigma_max': 2.0, 'prob': 0.5 }]]), dict(type='PackSelfSupInputs', meta_keys=['img_path']) ] train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), collate_fn=dict(type='default_collate'), dataset=dict( type='mmcls.ImageNet', data_root='/home/wangxin/mmselfsup_1.x/data/imagenet/', ann_file='meta/train.txt', data_prefix=dict(img_path='train/'), pipeline=[ dict( type='LoadImageFromFile', file_client_args=dict(backend='disk')), dict( type='MultiView', num_views=2, transforms=[[{ 'type': 'RandomResizedCrop', 'size': 224, 'backend': 'pillow' }, { 'type': 'RandomFlip', 'prob': 0.5 }, { 'type': 'RandomApply', 'transforms': [{ 'type': 'ColorJitter', 'brightness': 0.8, 'contrast': 0.8, 'saturation': 0.8, 'hue': 0.2 }], 'prob': 0.8 }, { 'type': 'RandomGrayscale', 'prob': 0.2, 'keep_channels': True, 'channel_weights': (0.114, 0.587, 0.2989) }, { 'type': 'RandomGaussianBlur', 'sigma_min': 0.1, 'sigma_max': 2.0, 'prob': 0.5 }]]), dict(type='PackSelfSupInputs', meta_keys=['img_path']) ])) launcher = 'none' work_dir = './work_dirs/selfsup/simclr_resnet50_8xb32-coslr-200e_in1k_mini'
02/23 08:36:05 - mmengine - WARNING - The "visualizer" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
02/23 08:36:05 - mmengine - WARNING - The "vis_backend" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
02/23 08:36:07 - mmengine - WARNING - The "model" registry in mmselfsup did not set import location. Fallback to call mmselfsup.utils.register_all_modules
instead.
02/23 08:36:08 - mmengine - WARNING - The "model" registry in mmcls did not set import location. Fallback to call mmcls.utils.register_all_modules
instead.
02/23 08:36:14 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
Traceback (most recent call last):
File "tools/train.py", line 99, in
I didn't report this error before I changed the pytorch version, but the program was killed due to lack of memory. Was it because cuda was unavailable at that time, so CPU was used to compute, causing memory overflow? Now is this error caused by a mismatch between my pytorch and cuda versions?
My cuda is available, does anyone know what the problem is? Is the installation error of my mmcv package causing this error?
I have fixed it! It really is a version issue of mmcv!
As title, The Imagenet1k is too big to me, my machine can't work with Imagenet1k . But i still want to run this program, how should i do ? I edit th batch size smaller,but it still be killed because excessive memory usage. Does anyone have any ideas or experience? Your reply will be very helpful to me!