open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.3k stars 1.04k forks source link

[Bug] When testing a mobilenet_v3-small model trained on a custom RGB-image dataset using mmpretrain, the output image channel order is BGR #1799

Open Free-Geter opened 10 months ago

Free-Geter commented 10 months ago

Branch

main branch (mmpretrain version)

Describe the bug

After customizing a classification task dataset of RGB images and training a mobilenetv3-small image classification model using mmpretrain, testing using the following instructions revealed that all outputs had the correct category labeling information, but the output resultant image was the bgr channel. Subsequent tests using images of the bgr channel showed that the output resultant images were the rgb channel order. So I think maybe there is some problem with the test code of the model

mim test mmpretrain Badminton-mobilenet-v3-small_bs32.py --checkpoint best_accuracy_top1_epoch_13.pth --out result.pkl

My model training and testing config is as follows

bgr_mean = [
    103.53,
    116.28,
    123.675,
]
bgr_std = [
    57.375,
    57.12,
    58.395,
]
data_preprocessor = dict(
    mean=[
        123.675,
        116.28,
        103.53,
    ],
    num_classes=2,
    std=[
        58.395,
        57.12,
        57.375,
    ],
    to_rgb=True)
data_root = 'D:\\CodeSpace\\AI\\Badminton\\database\\Video_View_Classification_split'
dataset_type = 'ImageNet'
default_hooks = dict(
    checkpoint=dict(
        interval=1, max_keep_ckpts=5, save_best='auto', type='CheckpointHook'),
    logger=dict(interval=30, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(
        enable=True,
        interval=300,
        out_dir=None,
        show=True,
        type='VisualizationHook',
        wait_time=5.0))
default_scope = 'mmpretrain'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
launcher = 'none'
load_from = '.\\work_dirs\\Badminton-mobilenet-v3-small_bs32\\best_accuracy_top1_epoch_13.pth'
log_level = 'INFO'
model = dict(
    backbone=dict(
        arch='small',
        frozen_stages=2,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmclassification/v0/mobilenet_v3/mobilenet-v3-small_8xb128_in1k_20221114-bd1bfcde.pth',
            prefix='backbone',
            type='Pretrained'),
        type='MobileNetV3'),
    head=dict(
        act_cfg=dict(type='HSwish'),
        dropout_rate=0.2,
        in_channels=576,
        init_cfg=dict(
            bias=0.0, layer='Linear', mean=0.0, std=0.01, type='Normal'),
        loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
        mid_channels=[
            1024,
        ],
        num_classes=2,
        topk=(
            1,
            5,
        ),
        type='StackedLinearClsHead'),
    neck=dict(type='GlobalAveragePooling'),
    type='ImageClassifier')
optim_wrapper = dict(
    optimizer=dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0001))
param_scheduler = dict(
    by_epoch=True, gamma=0.1, milestones=[
        15,
    ], type='MultiStepLR')
randomness = dict(deterministic=False, seed=None)
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=128,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='test',
        data_root=
        'D:\\CodeSpace\\AI\\Badminton\\database\\Video_View_Classification_split',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset',
        with_label=True),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(topk=1, type='Accuracy')
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
    dict(crop_size=224, type='CenterCrop'),
    dict(type='PackInputs'),
]
train_cfg = dict(by_epoch=True, max_epochs=15, val_interval=1)
train_dataloader = dict(
    batch_size=128,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='train',
        data_root=
        'D:\\CodeSpace\\AI\\Badminton\\database\\Video_View_Classification_split',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', scale=224, type='RandomResizedCrop'),
            dict(direction='horizontal', prob=0.5, type='RandomFlip'),
            dict(
                hparams=dict(pad_val=[
                    104,
                    116,
                    124,
                ]),
                policies='imagenet',
                type='AutoAugment'),
            dict(
                erase_prob=0.2,
                fill_color=[
                    103.53,
                    116.28,
                    123.675,
                ],
                fill_std=[
                    57.375,
                    57.12,
                    58.395,
                ],
                max_area_ratio=0.3333333333333333,
                min_area_ratio=0.02,
                mode='rand',
                type='RandomErasing'),
            dict(type='PackInputs'),
        ],
        split='train',
        type='CustomDataset',
        with_label=True),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(backend='pillow', scale=224, type='RandomResizedCrop'),
    dict(direction='horizontal', prob=0.5, type='RandomFlip'),
    dict(
        hparams=dict(pad_val=[
            104,
            116,
            124,
        ]),
        policies='imagenet',
        type='AutoAugment'),
    dict(
        erase_prob=0.2,
        fill_color=[
            103.53,
            116.28,
            123.675,
        ],
        fill_std=[
            57.375,
            57.12,
            58.395,
        ],
        max_area_ratio=0.3333333333333333,
        min_area_ratio=0.02,
        mode='rand',
        type='RandomErasing'),
    dict(type='PackInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=128,
    collate_fn=dict(type='default_collate'),
    dataset=dict(
        ann_file='',
        data_prefix='val',
        data_root=
        'D:\\CodeSpace\\AI\\Badminton\\database\\Video_View_Classification_split',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(backend='pillow', edge='short', scale=256, type='ResizeEdge'),
            dict(crop_size=224, type='CenterCrop'),
            dict(type='PackInputs'),
        ],
        split='val',
        type='CustomDataset',
        with_label=True),
    num_workers=5,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(topk=1, type='Accuracy')
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    type='UniversalVisualizer', vis_backends=[
        dict(type='LocalVisBackend'),
    ])
work_dir = './work_dirs\\Badminton-mobilenet-v3-small_bs32'

The system information is as follows:

System environment:
    sys.platform: win32
    Python: 3.8.18 (default, Sep 11 2023, 13:39:12) [MSC v.1916 64 bit (AMD64)]
    CUDA available: True
    numpy_random_seed: 2062937103
    GPU 0: NVIDIA GeForce GTX 1060 with Max-Q Design
    CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
    NVCC: Cuda compilation tools, release 11.8, V11.8.89
    MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.37.32822 版
    GCC: n/a
    PyTorch: 2.0.1
    PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 193431937
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.7
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.15.2
    OpenCV: 4.8.0
    MMEngine: 0.8.4
------------------------------------------------------------

An example of an output error image is as follows CHEN_Long_CHOU_Tien_Chen_World_Tour_Finals_Group_Stage_frame_20825 jpg_0

Environment

{'sys.platform': 'win32',
 'Python': '3.8.18 (default, Sep 11 2023, 13:39:12) [MSC v.1916 64 bit '
           '(AMD64)]',
 'CUDA available': True,
 'numpy_random_seed': 2147483648,
 'GPU 0': 'NVIDIA GeForce GTX 1060 with Max-Q Design',
 'CUDA_HOME': 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8',
 'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
 'MSVC': '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.37.32822 版',
 'GCC': 'n/a',
 'PyTorch': '2.0.1',
 'TorchVision': '0.15.2',
 'OpenCV': '4.8.0',
 'MMEngine': '0.8.4',
 'MMCV': '2.0.1',
 'MMPreTrain': '1.0.2+'}

Other information

  1. mmpretrain has not been modified
  2. probably opencv reads the image as bgr but doesn't convert to rgb before testing the output
lmx1989219 commented 7 months ago

mark

3maoyap commented 4 months ago

I meet the same question.