RecursionError arose while training the custom dataset

Priyadrasta-2111CS10 commented 2 years ago

I encountered the error: RecursionError: maximum recursion depth exceeded while calling a Python object while training with my custom dataset. I tried to set num_workers to 0 but the issue didn't get resolved. Please provide the fix.Thanks in adavance.

MeowZheng commented 2 years ago

I think there might be some problems with code of the dataset. Could you provide more details about the dataset implementation?

Priyadrasta-2111CS10 commented 2 years ago

create a new file my_dataset.py in ./mmseg/dataset

After creating ./mmseg/dataset/my_dataset.py，I added it in ./mmseg/dataset/init.py:

The following is the code for config file for the custom dataset.

# Your dataset type defined in ./mmseg/datasets/__init__.py
dataset_type = 'MyDataset'
# Correct path of your dataset
data_root = 'data/my_dataset'

img_norm_cfg = dict( # This img_norm_cfg is widely used because it is mean and std of ImageNet 1K pretrained model
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

crop_size = (512, 512) # Crop size of image in training 
dist_params = dict(backend='nccl') 
log_level = 'INFO' 
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
total_iters = 80000
checkpoint_config = dict(by_epoch=False, interval=8000)
evaluation = dict(interval=8000, metric='mIoU')

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])
checkpoint_config = dict(by_epoch=False, interval=8000)
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='VIT_MLA',
        model_name='vit_large_patch16_384',
        img_size=512,
        patch_size=16,
        in_chans=3,
        embed_dim=1024,
        depth=24,                                                
        num_heads=16,
        num_classes=3,
        drop_rate=0.1,
        norm_cfg=norm_cfg,
        pos_embed_interp=True,
        align_corners=False,
        mla_channels=256,
        mla_index=(5, 11, 17, 23)
    ),
    decode_head=dict(
        type='VIT_MLAHead',
        in_channels=1024,
        channels=512,
        img_size=512,
        mla_channels=256,
        mlahead_channels=128,
        num_classes=3,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=[
        dict(
            type='VIT_MLA_AUXIHead',
            in_channels=256,
            channels=512,
            in_index=0,
            img_size=512,
            num_classes=3,
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
        dict(
            type='VIT_MLA_AUXIHead',
            in_channels=256,
            channels=512,
            in_index=1,
            img_size=512,
            num_classes=3,
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
        dict(
            type='VIT_MLA_AUXIHead',
            in_channels=256,
            channels=512,
            in_index=2,
            img_size=512,
            num_classes=3,
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
        dict(
            type='VIT_MLA_AUXIHead',
            in_channels=256,
            channels=512,
            in_index=3,
            img_size=512,
            num_classes=3,
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    ])

# model training and testing settings
train_cfg = dict()
test_cfg = dict(mode='whole')
train_pipeline=[]
test_pipeline=[]
data = dict(
    samples_per_gpu=4, # Batch size of a single GPU
    workers_per_gpu=0, # Worker to pre-fetch data for each single GPU
    train=dict( # Train dataset config
        type=dataset_type, # Type of dataset, refer to mmseg/datasets/ for details.
        data_root=data_root, # The root of dataset.
        img_dir='img_dir/train', # The image directory of dataset.
        ann_dir='ann_dir/train',  # The annotation directory of dataset.
        pipeline=train_pipeline), # pipeline, this is passed by the train_pipeline created before.
    val=dict( # Validation dataset config.
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline), # Pipeline is passed by test_pipeline created before.
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline))

open-mmlab / mmsegmentation

RecursionError arose while training the custom dataset #1648