[Bug] Error when training model - TypeError: BaseDataset.__init__() got an unexpected keyword argument 'split'

TNodeCode commented 10 months ago

Branch

main branch (mmpretrain version)

Describe the bug

I have tried to train a model on a custom dataset using the mmpretrain library.

First I cloned the repository, then I created a dataset folder with the following structure:

data -- custom_dataset --- train --- test --- val

Next I followed the documentation (https://mmpretrain.readthedocs.io/en/latest/user_guides/train.html) on how to train a classification model on a custom dataset.

I created a new configuration file:

configs/mobilenet_v2/mobilenet-v2_finetune.py

_base_ = [
    '../_base_/models/mobilenet_v2_1x.py',
    '../_base_/datasets/imagenet_bs32_pil_resize.py',
    '../_base_/schedules/imagenet_bs256_epochstep.py',
    '../_base_/default_runtime.py'
]

# model settings
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)

# data settings
data_root = 'data/custom_dataset'
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='val',
    ))
test_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))

# schedule settings
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
param_scheduler = dict(
    type='MultiStepLR', by_epoch=True, milestones=[15], gamma=0.1)

I then tried to train the model on my custom dataset with the command python ./tools/train.py ./configs/mobilenet_v2/mobilenet-v2_finetune.py

Then I get the following error:

C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
12/30 17:42:16 - mmengine - INFO -
------------------------------------------------------------
System environment:
    sys.platform: win32
    Python: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
    CUDA available: False
    numpy_random_seed: 1691281147
    MSVC: Microsoft (R) C/C++-Optimierungscompiler Version 19.26.28806 für x64
    GCC: n/a
    PyTorch: 2.0.1+cu117
    PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 193431937
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.15.2+cu117
    OpenCV: 4.7.0
    MMEngine: 0.10.2

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 1691281147
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

12/30 17:42:16 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=256)
data_preprocessor = dict(
    mean=[
        123.675,
        116.28,
        103.53,
    ],
    num_classes=1000,
    std=[
        58.395,
        57.12,
        57.375,
    ],
    to_rgb=True)
data_root = 'data/custom_dataset'
dataset_type = 'ImageNet'
default_hooks = dict(
    checkpoint=dict(interval=1, type='CheckpointHook'),
    logger=dict(interval=100, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(enable=False, type='VisualizationHook'))
default_scope = 'mmpretrain'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
launcher = 'none'
load_from = None
log_level = 'INFO'
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmclassification/v0/mobilenet_v2/mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth',
            prefix='backbone',
            type='Pretrained'),
        type='MobileNetV2',
        widen_factor=1.0),
    head=dict(
        in_channels=1280,
        loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
        num_classes=10,
        topk=(
            1,
            5,
        ),
        type='LinearClsHead'),
    neck=dict(type='GlobalAveragePooling'),
    type='ImageClassifier')
optim_wrapper = dict(
    optimizer=dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0001))
param_scheduler = dict(
    by_epoch=True,
    gamma=0.1,
    milestones=[
        15,
    ],
    step_size=1,
    type='MultiStepLR')
randomness = dict(deterministic=False, seed=None)
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='test',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    topk=(
        1,
        5,
    ), type='Accuracy')
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
    dict(crop_size=224, type='CenterCrop'),
    dict(type='PackInputs'),
]
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=1)
train_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='train',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', scale=224, type='RandomResizedCrop'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(type='PackInputs'),
        ],
        split='train',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', scale=224, type='RandomResizedCrop'),
    dict(direction='horizontal', prob=0.5, type='RandomFlip'),
    dict(type='PackInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='val',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    topk=(
        1,
        5,
    ), type='Accuracy')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    type='UniversalVisualizer', vis_backends=[
        dict(type='LocalVisBackend'),
    ])
work_dir = './work_dirs\\mobilenet-v2_finetune'

12/30 17:42:21 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
12/30 17:42:21 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
 --------------------
before_train:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DistSamplerSeedHook
 --------------------
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
 --------------------
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_train_epoch:
(NORMAL      ) IterTimerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_val_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_val_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_val_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) VisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_train:
(VERY_HIGH   ) RuntimeInfoHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_test_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_test_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_test_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) VisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_run:
(BELOW_NORMAL) LoggerHook
 --------------------
Traceback (most recent call last):
  File "C:\Users\tilof\PycharmProjects\DeepLearningProjects\OpenMMLab\mmpretrain\tools\train.py", line 162, in <module>
    main()
  File "C:\Users\tilof\PycharmProjects\DeepLearningProjects\OpenMMLab\mmpretrain\tools\train.py", line 158, in main
    runner.train()
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1728, in train
    self._train_loop = self.build_train_loop(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1527, in build_train_loop
    loop = EpochBasedTrainLoop(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\runner\runner.py", line 1370, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\registry\registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "C:\Users\tilof\AppData\Local\Programs\Python\Python310\lib\site-packages\mmpretrain\datasets\custom.py", line 207, in __init__
    super().__init__(
TypeError: BaseDataset.__init__() got an unexpected keyword argument 'split'

Environment

{'sys.platform': 'win32', 'Python': '3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 ' '64 bit (AMD64)]', 'CUDA available': False, 'numpy_random_seed': 2147483648, 'MSVC': 'Microsoft (R) C/C++-Optimierungscompiler Version 19.26.28806 für x64', 'GCC': 'n/a', 'PyTorch': '2.0.1+cu117', 'TorchVision': '0.15.2+cu117', 'OpenCV': '4.7.0', 'MMEngine': '0.10.2', 'MMCV': '2.1.0', 'MMPreTrain': '1.1.1+e95d9ac'}

Other information

No response

leon-costa commented 10 months ago

I had the same problem following the guide How to Pretrain with Custom Dataset.

The problem is that the dataset you are overriding has a split argument (_base_/datasets/imagenet_bs32_pil_resize.py#L32) which doesn't work with the CustomDataset.

The solution I found was to copy all the arguments and add an extra _delete_=True (doc). Something like this (to repeat for other datasets):

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224, backend='pillow'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
        pipeline=train_pipeline,
        _delete_=True,
    ))

Huy-Thai commented 10 months ago

Hi, @leon-costa,

I'm trying but not working, Is there any way to fix the above problem?

smoothumut commented 9 months ago

Hi everyone, any update? I am also having exact same problem with CustomDataset

smoothumut commented 9 months ago

I have made it worked.

@leon-costa 's solution and the link he gave https://mmpretrain.readthedocs.io/en/latest/user_guides/config.html#ignore-some-fields-in-the-base-configs helped me better understand the problem.

In my case I have removed the

'../base/datasets/imagenet_bs32_pil_resize.py', from my config's base,

then applied required dict settings (of course without split) for dataset into my config. Then it worked. thanks all for guiding

gjustin40 commented 9 months ago

@TNodeCode Just remove split args of each dataloader config.

train_dataloader = dict(
    batch_size=32,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='train',
        data_root='data/custom_dataset',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', scale=224, type='RandomResizedCrop'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(type='PackInputs'),
        ],
        split='train', <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  remove (same as val_dataloader)
        type='CustomDataset'),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))

split option is only used with datasets that have implemented the split feature, so if the split feature has not been specifically configured when using a custom dataset, it can be removed.

A prominent dataset that utilizes this feature is ImageNet.

OpenByteDev commented 3 months ago

An alternative solution is to subclass CustomDataset and just throw away the split arg:

from mmpretrain.registry import DATASETS
from mmpretrain.datasets.custom import CustomDataset

@DATASETS.register_module()
class CustomDataset2(CustomDataset):
    def __init__(self, split=None, **kwargs):
        super(CustomDataset2, self).__init__(**kwargs)

open-mmlab / mmpretrain