open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.18k stars 735 forks source link

[Bug] fcenet model gets stuck after the first iteration during training #2043

Open tmargaryan-aligntech opened 2 months ago

tmargaryan-aligntech commented 2 months ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmocr

Environment

System environment: sys.platform: win32 Python: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] CUDA available: True MUSA available: False numpy_random_seed: 0 GPU 0: Tesla V100-SXM2-16GB CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2 NVCC: Cuda compilation tools, release 11.2, V11.2.152 MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27051 for x64 GCC: n/a PyTorch: 1.11.0+cu113 PyTorch compiling details: PyTorch built with:

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 0 Distributed launcher: none Distributed training: False GPU number: 1

Reproduces the problem - code sample

Here is my config:

2024/05/02 19:45:57 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: win32
    Python: 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 0
    GPU 0: Tesla V100-SXM2-16GB
    CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
    NVCC: Cuda compilation tools, release 11.2, V11.2.152
    MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27051 for x64
    GCC: n/a
    PyTorch: 1.11.0+cu113
    PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.12.0+cu113
    OpenCV: 4.8.1
    MMEngine: 0.10.3

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 0
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

2024/05/02 19:45:57 - mmengine - INFO - Config:
auto_scale_lr = dict(base_batch_size=16)
data_root = None
default_hooks = dict(
    checkpoint=dict(
        interval=5,
        rule='greater',
        save_best='icdar/hmean',
        type='CheckpointHook'),
    logger=dict(interval=5, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    sync_buffer=dict(type='SyncBuffersHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(
        draw_gt=False,
        draw_pred=False,
        enable=False,
        interval=1,
        show=False,
        type='VisualizationHook'))
default_scope = 'mmocr'
det_test = dict(
    ann_file='test.json',
    data_prefix=dict(img_path='test_imgs/'),
    data_root=None,
    pipeline=None,
    test_mode=True,
    type='OCRDataset')
det_train = dict(
    ann_file='train.json',
    data_prefix=dict(img_path='train_imgs/'),
    data_root=None,
    filter_cfg=dict(filter_empty_gt=True, min_size=32),
    pipeline=None,
    type='OCRDataset')
det_val = dict(
    ann_file='train.json',
    data_prefix=dict(img_path='train_imgs/'),
    data_root=None,
    pipeline=None,
    test_mode=True,
    type='OCRDataset')
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
find_unused_parameters = True
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=10)
model = dict(
    backbone=dict(
        depth=50,
        frozen_stages=-1,
        init_cfg=dict(checkpoint='torchvision://resnet50', type='Pretrained'),
        norm_cfg=dict(requires_grad=True, type='BN'),
        norm_eval=False,
        num_stages=4,
        out_indices=(
            1,
            2,
            3,
        ),
        style='pytorch',
        type='mmdet.ResNet'),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        pad_size_divisor=32,
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='TextDetDataPreprocessor'),
    det_head=dict(
        fourier_degree=5,
        in_channels=256,
        module_loss=dict(num_sample=50, type='FCEModuleLoss'),
        postprocessor=dict(
            alpha=1.2,
            beta=1.0,
            num_reconstr_points=50,
            scales=(
                8,
                16,
                32,
            ),
            score_thr=0.3,
            text_repr_type='quad',
            type='FCEPostprocessor'),
        type='FCEHead'),
    neck=dict(
        act_cfg=None,
        add_extra_convs='on_output',
        in_channels=[
            512,
            1024,
            2048,
        ],
        num_outs=3,
        out_channels=256,
        relu_before_extra_convs=True,
        type='mmdet.FPN'),
    type='FCENet')
optim_wrapper = dict(
    optimizer=dict(lr=1e-05, momentum=0.9, type='SGD', weight_decay=0.0005),
    type='OptimWrapper')
param_scheduler = None
randomness = dict(seed=0)
resume = False
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        datasets=[
            dict(
                ann_file='test.json',
                data_prefix=dict(img_path='test_imgs/'),
                data_root='C:/Data/detection',
                pipeline=None,
                test_mode=True,
                type='OCRDataset'),
        ],
        pipeline=[
            dict(
                color_type='color_ignore_orientation',
                type='LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                2260,
                2260,
            ), type='Resize'),
            dict(
                type='LoadOCRAnnotations',
                with_bbox=True,
                with_label=True,
                with_polygon=True),
            dict(
                meta_keys=(
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='PackTextDetInputs'),
        ],
        type='ConcatDataset'),
    num_workers=1,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(type='HmeanIOUMetric')
test_list = [
    dict(
        ann_file='test.json',
        data_prefix=dict(img_path='test_imgs/'),
        data_root=None,
        pipeline=None,
        test_mode=True,
        type='OCRDataset'),
]
test_pipeline = [
    dict(color_type='color_ignore_orientation', type='LoadImageFromFile'),
    dict(keep_ratio=True, scale=(
        1280,
        960,
    ), type='Resize'),
    dict(
        type='LoadOCRAnnotations',
        with_bbox=True,
        with_label=True,
        with_polygon=True),
    dict(type='FixInvalidPolygon'),
    dict(
        meta_keys=(
            'img_path',
            'ori_shape',
            'img_shape',
            'scale_factor',
        ),
        type='PackTextDetInputs'),
]
train_cfg = dict(max_epochs=1500, type='EpochBasedTrainLoop', val_interval=1)
train_dataloader = dict(
    batch_size=10,
    dataset=dict(
        datasets=[
            dict(
                ann_file='train.json',
                data_prefix=dict(img_path='train_imgs/'),
                data_root='C:/Data/detection',
                pipeline=None,
                test_mode=False,
                type='OCRDataset'),
        ],
        pipeline=[
            dict(
                color_type='color_ignore_orientation',
                type='LoadImageFromFile'),
            dict(
                type='LoadOCRAnnotations',
                with_bbox=True,
                with_label=True,
                with_polygon=True),
            dict(
                keep_ratio=True,
                ratio_range=(
                    0.75,
                    2.5,
                ),
                scale=(
                    800,
                    800,
                ),
                type='RandomResize'),
            dict(
                crop_ratio=0.5,
                iter_num=1,
                min_area_ratio=0.2,
                type='TextDetRandomCropFlip'),
            dict(
                prob=0.8,
                transforms=[
                    dict(min_side_ratio=0.3, type='RandomCrop'),
                ],
                type='RandomApply'),
            dict(
                prob=0.5,
                transforms=[
                    dict(
                        max_angle=30,
                        pad_with_fixed_color=False,
                        type='RandomRotate',
                        use_canvas=True),
                ],
                type='RandomApply'),
            dict(
                prob=[
                    0.6,
                    0.4,
                ],
                transforms=[
                    [
                        dict(keep_ratio=True, scale=800, type='Resize'),
                        dict(target_scale=800, type='SourceImagePad'),
                    ],
                    dict(keep_ratio=False, scale=800, type='Resize'),
                ],
                type='RandomChoice'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(
                brightness=0.12549019607843137,
                contrast=0.5,
                op='ColorJitter',
                saturation=0.5,
                type='TorchVisionWrapper'),
            dict(
                meta_keys=(
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='PackTextDetInputs'),
        ],
        type='ConcatDataset'),
    num_workers=8,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_list = [
    dict(
        ann_file='train.json',
        data_prefix=dict(img_path='train_imgs/'),
        data_root=None,
        filter_cfg=dict(filter_empty_gt=True, min_size=32),
        pipeline=None,
        type='OCRDataset'),
]
train_pipeline = [
    dict(color_type='color_ignore_orientation', type='LoadImageFromFile'),
    dict(
        type='LoadOCRAnnotations',
        with_bbox=True,
        with_label=True,
        with_polygon=True),
    dict(type='FixInvalidPolygon'),
    dict(
        keep_ratio=True,
        ratio_range=(
            0.75,
            2.5,
        ),
        scale=(
            800,
            800,
        ),
        type='RandomResize'),
    dict(
        crop_ratio=0.5,
        iter_num=1,
        min_area_ratio=0.2,
        type='TextDetRandomCropFlip'),
    dict(
        prob=0.8,
        transforms=[
            dict(min_side_ratio=0.3, type='RandomCrop'),
        ],
        type='RandomApply'),
    dict(
        prob=0.5,
        transforms=[
            dict(
                max_angle=30,
                pad_with_fixed_color=False,
                type='RandomRotate',
                use_canvas=True),
        ],
        type='RandomApply'),
    dict(
        prob=[
            0.6,
            0.4,
        ],
        transforms=[
            [
                dict(keep_ratio=True, scale=800, type='Resize'),
                dict(target_scale=800, type='SourceImagePad'),
            ],
            dict(keep_ratio=False, scale=800, type='Resize'),
        ],
        type='RandomChoice'),
    dict(direction='horizontal', prob=0.5, type='RandomFlip'),
    dict(
        brightness=0.12549019607843137,
        contrast=0.5,
        op='ColorJitter',
        saturation=0.5,
        type='TorchVisionWrapper'),
    dict(
        meta_keys=(
            'img_path',
            'ori_shape',
            'img_shape',
            'scale_factor',
        ),
        type='PackTextDetInputs'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        datasets=[
            dict(
                ann_file='val.json',
                data_prefix=dict(img_path='val_imgs/'),
                data_root='C:/Data/detection',
                pipeline=None,
                test_mode=False,
                type='OCRDataset'),
        ],
        pipeline=[
            dict(
                color_type='color_ignore_orientation',
                type='LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                2260,
                2260,
            ), type='Resize'),
            dict(
                type='LoadOCRAnnotations',
                with_bbox=True,
                with_label=True,
                with_polygon=True),
            dict(
                meta_keys=(
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='PackTextDetInputs'),
        ],
        type='ConcatDataset'),
    num_workers=1,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(type='HmeanIOUMetric')
val_list = [
    dict(
        ann_file='train.json',
        data_prefix=dict(img_path='train_imgs/'),
        data_root=None,
        pipeline=None,
        test_mode=True,
        type='OCRDataset'),
]
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    name=
    'time.struct_time(tm_year=2024, tm_mon=5, tm_mday=2, tm_hour=19, tm_min=45, tm_sec=54, tm_wday=3, tm_yday=123, tm_isdst=0)',
    type='TextDetLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
        dict(type='TensorboardVisBackend'),
    ])
work_dir = 'work_dirs/fcenet_resnet50_fpn_1500e_totaltext/'

2024/05/02 19:46:05 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
2024/05/02 19:46:05 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) SyncBuffersHook                    
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) SyncBuffersHook                    
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) VisualizationHook                  
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) VisualizationHook                  
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
2024/05/02 19:46:08 - mmengine - INFO - load model from: torchvision://resnet50
2024/05/02 19:46:08 - mmengine - INFO - Loads checkpoint by torchvision backend from path: torchvision://resnet50
2024/05/02 19:46:08 - mmengine - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Name of parameter - Initialization information

backbone.conv1.weight - torch.Size([64, 3, 7, 7]): 
PretrainedInit: load from torchvision://resnet50 

backbone.bn1.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.bn1.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.conv1.weight - torch.Size([64, 64, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn1.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn1.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.conv2.weight - torch.Size([64, 64, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn2.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn2.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.conv3.weight - torch.Size([256, 64, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn3.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.bn3.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.downsample.0.weight - torch.Size([256, 64, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.downsample.1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.0.downsample.1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.conv1.weight - torch.Size([64, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn1.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn1.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.conv2.weight - torch.Size([64, 64, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn2.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn2.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.conv3.weight - torch.Size([256, 64, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn3.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.1.bn3.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.conv1.weight - torch.Size([64, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn1.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn1.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.conv2.weight - torch.Size([64, 64, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn2.weight - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn2.bias - torch.Size([64]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.conv3.weight - torch.Size([256, 64, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn3.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer1.2.bn3.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.conv1.weight - torch.Size([128, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn1.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn1.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.conv2.weight - torch.Size([128, 128, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn2.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn2.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.conv3.weight - torch.Size([512, 128, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn3.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.bn3.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.downsample.0.weight - torch.Size([512, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.downsample.1.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.0.downsample.1.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.conv1.weight - torch.Size([128, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn1.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn1.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.conv2.weight - torch.Size([128, 128, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn2.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn2.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.conv3.weight - torch.Size([512, 128, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn3.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.1.bn3.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.conv1.weight - torch.Size([128, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn1.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn1.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.conv2.weight - torch.Size([128, 128, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn2.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn2.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.conv3.weight - torch.Size([512, 128, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn3.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.2.bn3.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.conv1.weight - torch.Size([128, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn1.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn1.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.conv2.weight - torch.Size([128, 128, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn2.weight - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn2.bias - torch.Size([128]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.conv3.weight - torch.Size([512, 128, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn3.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer2.3.bn3.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.conv1.weight - torch.Size([256, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.downsample.0.weight - torch.Size([1024, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.downsample.1.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.0.downsample.1.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.conv1.weight - torch.Size([256, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.1.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.conv1.weight - torch.Size([256, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.2.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.conv1.weight - torch.Size([256, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.3.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.conv1.weight - torch.Size([256, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.4.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.conv1.weight - torch.Size([256, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn1.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn1.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.conv2.weight - torch.Size([256, 256, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn2.weight - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn2.bias - torch.Size([256]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.conv3.weight - torch.Size([1024, 256, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn3.weight - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer3.5.bn3.bias - torch.Size([1024]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.conv1.weight - torch.Size([512, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn1.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn1.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.conv2.weight - torch.Size([512, 512, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn2.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn2.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.conv3.weight - torch.Size([2048, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn3.weight - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.bn3.bias - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.downsample.0.weight - torch.Size([2048, 1024, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.downsample.1.weight - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.0.downsample.1.bias - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.conv1.weight - torch.Size([512, 2048, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn1.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn1.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.conv2.weight - torch.Size([512, 512, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn2.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn2.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.conv3.weight - torch.Size([2048, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn3.weight - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.1.bn3.bias - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.conv1.weight - torch.Size([512, 2048, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn1.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn1.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.conv2.weight - torch.Size([512, 512, 3, 3]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn2.weight - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn2.bias - torch.Size([512]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.conv3.weight - torch.Size([2048, 512, 1, 1]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn3.weight - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

backbone.layer4.2.bn3.bias - torch.Size([2048]): 
PretrainedInit: load from torchvision://resnet50 

neck.lateral_convs.0.conv.weight - torch.Size([256, 512, 1, 1]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.lateral_convs.0.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

neck.lateral_convs.1.conv.weight - torch.Size([256, 1024, 1, 1]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.lateral_convs.1.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

neck.lateral_convs.2.conv.weight - torch.Size([256, 2048, 1, 1]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.lateral_convs.2.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

neck.fpn_convs.0.conv.weight - torch.Size([256, 256, 3, 3]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.fpn_convs.0.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

neck.fpn_convs.1.conv.weight - torch.Size([256, 256, 3, 3]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.fpn_convs.1.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

neck.fpn_convs.2.conv.weight - torch.Size([256, 256, 3, 3]): 
XavierInit: gain=1, distribution=uniform, bias=0 

neck.fpn_convs.2.conv.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of FCENet  

det_head.out_conv_cls.weight - torch.Size([4, 256, 3, 3]): 
NormalInit: mean=0, std=0.01, bias=0 

det_head.out_conv_cls.bias - torch.Size([4]): 
NormalInit: mean=0, std=0.01, bias=0 

det_head.out_conv_reg.weight - torch.Size([22, 256, 3, 3]): 
NormalInit: mean=0, std=0.01, bias=0 

det_head.out_conv_reg.bias - torch.Size([22]): 
NormalInit: mean=0, std=0.01, bias=0 
2024/05/02 19:46:08 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
2024/05/02 19:46:08 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
2024/05/02 19:46:08 - mmengine - INFO - Checkpoints will be saved to D:\AlignProjects\almarkocr\research\mmocr\trainer_det\work_dirs\fcenet_resnet50_fpn_1500e_totaltext.
2024/05/02 19:46:38 - mmengine - INFO - Epoch(train)    [1][5/8]  lr: 1.0000e-05  eta: 19:44:55  time: 5.9271  data_time: 4.9296  memory: 11810  loss: 7.8055  loss_text: 2.1384  loss_center: 2.1940  loss_reg_x: 1.6825  loss_reg_y: 1.7907
2024/05/02 19:46:39 - mmengine - INFO - Exp name: fcenet_resnet50_fpn_1500e_totaltext_20240502_194554

Reproduces the problem - command or script

My images are 1024x1024. There is no data issue for sure. I have tried different batch size, like 1, 2, 4, 8, but faced the same issue. DBNetPP model works fine on the same machine with the same data and have a good accuracy.

Reproduces the problem - error message

There is no error. The process gets stuck.

EDIT: After 14 hours, here are additional logs:

2024/05/02 19:46:38 - mmengine - INFO - Epoch(train)    [1][5/8]  lr: 1.0000e-05  eta: 19:44:55  time: 5.9271  data_time: 4.9296  memory: 11810  loss: 7.8055  loss_text: 2.1384  loss_center: 2.1940  loss_reg_x: 1.6825  loss_reg_y: 1.7907
2024/05/02 19:46:39 - mmengine - INFO - Exp name: fcenet_resnet50_fpn_1500e_totaltext_20240502_194554
2024/05/03 01:51:10 - mmengine - INFO - Epoch(val)    [1][ 5/80]    eta: 3 days, 19:07:41  time: 4374.1597  data_time: 0.8993  memory: 11810  
2024/05/03 08:46:18 - mmengine - INFO - Epoch(val)    [1][10/80]    eta: 3 days, 18:57:28  time: 4677.8364  data_time: 0.4499  memory: 1394  

Additional information

No response

256-7421142 commented 3 weeks ago

请问你在复现FCENet时有遇到这个问题吗https://github.com/open-mmlab/mmocr/issues/2044