Open TNodeCode opened 10 months ago
I had the same problem following the guide How to Pretrain with Custom Dataset.
The problem is that the dataset you are overriding has a split
argument (_base_/datasets/imagenet_bs32_pil_resize.py#L32
) which doesn't work with the CustomDataset
.
The solution I found was to copy all the arguments and add an extra _delete_=True
(doc). Something like this (to repeat for other datasets):
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', scale=224, backend='pillow'),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
train_dataloader = dict(
dataset=dict(
type='CustomDataset',
data_root=data_root,
ann_file='', # We assume you are using the sub-folder format without ann_file
data_prefix='train',
pipeline=train_pipeline,
_delete_=True,
))
Hi, @leon-costa,
I'm trying but not working, Is there any way to fix the above problem?
Hi everyone, any update? I am also having exact same problem with CustomDataset
I have made it worked.
@leon-costa 's solution and the link he gave https://mmpretrain.readthedocs.io/en/latest/user_guides/config.html#ignore-some-fields-in-the-base-configs helped me better understand the problem.
In my case I have removed the
'../base/datasets/imagenet_bs32_pil_resize.py', from my config's base,
then applied required dict settings (of course without split) for dataset into my config. Then it worked. thanks all for guiding
@TNodeCode Just remove split args of each dataloader config.
train_dataloader = dict(
batch_size=32,
collate_fn=dict(type='default_collate'),
dataset=dict(
ann_file='',
data_prefix='train',
data_root='data/custom_dataset',
pipeline=[
dict(type='LoadImageFromFile'),
dict(backend='pillow', scale=224, type='RandomResizedCrop'),
dict(direction='horizontal', prob=0.5, type='RandomFlip'),
dict(type='PackInputs'),
],
split='train', <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< remove (same as val_dataloader)
type='CustomDataset'),
num_workers=5,
persistent_workers=True,
pin_memory=True,
sampler=dict(shuffle=True, type='DefaultSampler'))
split
option is only used with datasets that have implemented the split feature, so if the split
feature has not been specifically configured when using a custom dataset, it can be removed.
A prominent dataset that utilizes this feature is ImageNet
.
An alternative solution is to subclass CustomDataset and just throw away the split arg:
from mmpretrain.registry import DATASETS
from mmpretrain.datasets.custom import CustomDataset
@DATASETS.register_module()
class CustomDataset2(CustomDataset):
def __init__(self, split=None, **kwargs):
super(CustomDataset2, self).__init__(**kwargs)
Branch
main branch (mmpretrain version)
Describe the bug
I have tried to train a model on a custom dataset using the mmpretrain library.
First I cloned the repository, then I created a dataset folder with the following structure:
Next I followed the documentation (https://mmpretrain.readthedocs.io/en/latest/user_guides/train.html) on how to train a classification model on a custom dataset.
I created a new configuration file:
configs/mobilenet_v2/mobilenet-v2_finetune.py
I then tried to train the model on my custom dataset with the command
python ./tools/train.py ./configs/mobilenet_v2/mobilenet-v2_finetune.py
Then I get the following error:
Environment
{'sys.platform': 'win32', 'Python': '3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 ' '64 bit (AMD64)]', 'CUDA available': False, 'numpy_random_seed': 2147483648, 'MSVC': 'Microsoft (R) C/C++-Optimierungscompiler Version 19.26.28806 für x64', 'GCC': 'n/a', 'PyTorch': '2.0.1+cu117', 'TorchVision': '0.15.2+cu117', 'OpenCV': '4.7.0', 'MMEngine': '0.10.2', 'MMCV': '2.1.0', 'MMPreTrain': '1.1.1+e95d9ac'}
Other information
No response