Open Jia-Baos opened 1 year ago
It seems your config is from MMSelfSup 0.x version, but the log shows you use MMEngine to start your training job. Try to pull and checkout to MMSelfSup 1.x branch, and use the new config of MAE.
emm, thsat's a great idea. using MMSelfSup 1.x branch to train MAE on my own dataset(refer: https://mmselfsup.readthedocs.io/zh_CN/dev-1.x/user_guides/4_pretrain_custom_dataset.html), it has worked, but now i want to train MAE on CIFAR10, how to change the dataset part of config, i can't find an example from (refer: https://github.com/open-mmlab/mmselfsup/tree/1.x/configs/selfsup/_base_/datasets), which is all about ImageNet
You can refer to this doc. Before you pre-train your model on CIFAR10, you should refactor the folder of CIFAR10 to the style of ImageNet1K and also create a ImageNet1K-style annotation file. After that, you can make few changes to the config, but replace data_root
and ann_file
with your own settings.
Thanks~
Hi, Have you solved the problem? I also wan to use CIFAR in MMselfsup, Can you provide your CIFAR directory structure? Thanks!
emm, i didn't use the CIFAR10, just organized my own datasets into the style of ImageNet1K and also create a ImageNet1K-style annotation file. If you want to use the CIFAR, a good idea is to change the CIFAR to ImageNet1K style or mmcls.CustomDataset style (refer: https://mmselfsup.readthedocs.io/zh_CN/dev-1.x/user_guides/4_pretrain_custom_dataset.html).
@Jia-Baos Thank you for your reply! What is the specific format of ImageNet1K-style? I found the ImageNet1K dataset on google, but not the meta folder, and I only found this picture in mmselfsup doc, but I don't know the specific structure, for example, what should be under the meta folder? Now I prepared my own data set, but I don't know exactly what ImageNet1K-style is, would you mind giving me an example or documentation? Thank you very much. Your reply is very helpful to me!
I‘m very pleased that i could be help, you can refer this file 北京超算30区使用MMClassification训练花卉图片分类模型.pdf in https://github.com/Jia-Baos/OpenMM.
Thank you very much for your reply. It's works!!
I‘m very pleased that i could be help, you can refer this file 北京超算30区使用MMClassification训练花卉图片分类模型.pdf in https://github.com/Jia-Baos/OpenMM.
hello , i can not find such pdf file. also i faced some issue while training my model on my custom dataset.
i got such error:
KeyError: 'SelfSupVisualizer is not in the visualizer registry. Please check whether the value of
SelfSupVisualizeris correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'
This is the config file:
>>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
base = [ '../base/models/mae_vit-base-p16.py',
'../base/datasets/cifar.py',
]
dataset settings
data_root = 'data/cifar'
file_client_args = dict(backend='disk')
data_source = 'CIFAR10' dataset_type = 'SingleViewDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), ] test_pipeline = []
prefetch
prefetch = False if not prefetch: train_pipeline.extend( [dict(type='ToTensor'), dict(type='Normalize', img_norm_cfg)]) test_pipeline.extend( [dict(type='ToTensor'), dict(type='Normalize', img_norm_cfg)])
dataset summary
data = dict( samples_per_gpu=128, workers_per_gpu=2, train=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=train_pipeline, prefetch=prefetch), val=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=test_pipeline, prefetch=prefetch), test=dict( type=dataset_type, data_source=dict( type=data_source, data_prefix='data/cifar', ), pipeline=test_pipeline, prefetch=prefetch))
evaluation = dict(interval=10, topk=(1, 5))
dataset 8 x 512
train_dataloader = dict(batch_size=128, num_workers=8)
<<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
optimizer wrapper
optimizer = dict( type='AdamW', lr=1.5e-4 * 4096 / 256, betas=(0.9, 0.95), weight_decay=0.05) optim_wrapper = dict( type='OptimWrapper', optimizer=optimizer, paramwise_cfg=dict( custom_keys={ 'ln': dict(decay_mult=0.0), 'bias': dict(decay_mult=0.0), 'pos_embed': dict(decay_mult=0.), 'mask_token': dict(decay_mult=0.), 'cls_token': dict(decay_mult=0.) }))
learning rate scheduler
param_scheduler = [ dict( type='LinearLR', start_factor=1e-4, by_epoch=True, begin=0, end=40, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=360, by_epoch=True, begin=40, end=400, convert_to_iter_based=True) ]
runtime settings
pre-train for 400 epochs
train_cfg = dict(max_epochs=3) default_hooks = dict( logger=dict(type='LoggerHook', interval=100),
only keeps the latest 3 checkpoints
randomness
randomness = dict(seed=0, diff_rank_seed=True) resume = True
This is the log:
During handling of the above exception, another exception occurred: 2023/02/20 01:19:00 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] CUDA available: True numpy_random_seed: 0 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /data/apps/cuda/11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.74 GCC: gcc (GCC) 7.3.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu111 OpenCV: 4.7.0 MMEngine: 0.5.0
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 0 diff_rank_seed: True Distributed launcher: none Distributed training: False GPU number: 1
2023/02/20 01:19:00 - mmengine - INFO - Config: model = dict( type='MAE', data_preprocessor=dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict(type='MAEViT', arch='b', patch_size=16, mask_ratio=0.75), neck=dict( type='MAEPretrainDecoder', patch_size=16, in_chans=3, embed_dim=768, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4.0), head=dict( type='MAEPretrainHead', norm_pix=True, patch_size=16, loss=dict(type='MAEReconstructionLoss')), init_cfg=[ dict(type='Xavier', distribution='uniform', layer='Linear'), dict(type='Constant', layer='LayerNorm', val=1.0, bias=0.0) ]) optimizer = dict(type='AdamW', lr=0.0024, betas=(0.9, 0.95), weight_decay=0.05) optim_wrapper = dict( type='OptimWrapper', optimizer=dict( type='AdamW', lr=0.0024, betas=(0.9, 0.95), weight_decay=0.05), paramwise_cfg=dict( custom_keys=dict( ln=dict(decay_mult=0.0), bias=dict(decay_mult=0.0), pos_embed=dict(decay_mult=0.0), mask_token=dict(decay_mult=0.0), cls_token=dict(decay_mult=0.0)))) param_scheduler = [ dict( type='LinearLR', start_factor=0.0001, by_epoch=True, begin=0, end=40, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=360, by_epoch=True, begin=40, end=400, convert_to_iter_based=True) ] train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=3) default_scope = 'mmselfsup' default_hooks = dict( runtime_info=dict(type='RuntimeInfoHook'), timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=100), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3), sampler_seed=dict(type='DistSamplerSeedHook')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl')) log_processor = dict( window_size=10, custom_cfg=[dict(data_src='', method='mean', window_size='global')]) vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='SelfSupVisualizer', vis_backends=[dict(type='LocalVisBackend')], name='visualizer') log_level = 'INFO' load_from = None resume = True data_source = 'CIFAR10' dataset_type = 'SingleViewDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ] test_pipeline = [ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ] prefetch = False data = dict( samples_per_gpu=128, workers_per_gpu=2, train=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False), val=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False), test=dict( type='SingleViewDataset', data_source=dict(type='CIFAR10', data_prefix='data/cifar'), pipeline=[ dict(type='ToTensor'), dict( type='Normalize', mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) ], prefetch=False)) train_dataloader = dict(batch_size=128, num_workers=8) randomness = dict(seed=0, diff_rank_seed=True) launcher = 'none' work_dir = 'work'
2023/02/20 01:19:00 - mmengine - WARNING - The "visualizer" registry in mmselfsup did not set import location. Fallback to call
mmselfsup.utils.register_all_modules
instead. 2023/02/20 01:19:00 - mmengine - WARNING - The "vis_backend" registry in mmselfsup did not set import location. Fallback to callmmselfsup.utils.register_all_modules
instead. 2023/02/20 01:19:01 - mmengine - WARNING - The "model" registry in mmselfsup did not set import location. Fallback to callmmselfsup.utils.register_all_modules
instead. 2023/02/20 01:19:06 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used. 2023/02/20 01:19:06 - mmengine - WARNING - The "hook" registry in mmselfsup did not set import location. Fallback to callmmselfsup.utils.register_all_modules
instead. 2023/02/20 01:19:06 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook(BELOW_NORMAL) LoggerHook
before_train: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook
before_train_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
before_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
after_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
after_train_epoch: (NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_val_epoch: (NORMAL ) IterTimerHook
before_val_iter: (NORMAL ) IterTimerHook
after_val_iter: (NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_val_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_test_epoch: (NORMAL ) IterTimerHook
before_test_iter: (NORMAL ) IterTimerHook
after_test_iter: (NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_test_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_run: (BELOW_NORMAL) LoggerHook
2023/02/20 01:19:07 - mmengine - WARNING - The "loop" registry in mmselfsup did not set import location. Fallback to call
mmselfsup.utils.register_all_modules
instead.Traceback (most recent call last): File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 43, in init super().init(runner, dataloader) File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/base_loop.py", line 26, in init self.dataloader = runner.build_dataloader( File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1331, in build_dataloader dataset_cfg = dataloader_cfg.pop('dataset') KeyError: 'dataset'
Traceback (most recent call last): File "tools/train.py", line 99, in
main()
File "tools/train.py", line 95, in main
runner.train()
File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1656, in train
self._train_loop = self.build_train_loop(
File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1448, in build_train_loop
loop = LOOPS.build(
File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 521, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/HOME/scz3182/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
raise type(e)(
KeyError: "class
EpochBasedTrainLoop
in mmengine/runner/loops.py: 'dataset'"