open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.43k stars 9.43k forks source link

model infer result is error after training yolox by change input image scale #9378

Open EnzoLiang opened 1 year ago

EnzoLiang commented 1 year ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

/home/gpu/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. 'On January 1, 2023, MMCV will release v2.0.0, in which it will remove ' sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1: Tesla P100-PCIE-16GB CUDA_HOME: /usr/local/cuda-10.0 NVCC: Cuda compilation tools, release 10.0, V10.0.13 GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.8.2 OpenCV: 4.6.0 MMCV: 1.7.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMDetection: 2.25.3+e71b499

Reproduces the problem - code sample

base = ['../base/schedules/schedule_1x.py', '../base/default_runtime.py']

img_scale = (640, 640) # height, width

img_scale = (640, 1280) # height, width

model settings

model = dict( type='YOLOX', input_size=img_scale, random_size_range=(18, 32), random_size_interval=10, backbone=dict(type='CSPDarknet', deepen_factor=1.33, widen_factor=1.25), neck=dict( type='YOLOXPAFPN', in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), bbox_head=dict( type='YOLOXHead', num_classes=2, in_channels=320, feat_channels=320), train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)),

In order to align the source code, the threshold of the val phase is

# 0.01, and the threshold of the test phase is 0.001.
test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.65)))

train_pipeline = [ dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), dict( type='RandomAffine', scaling_ratio_range=(0.1, 2), border=(-img_scale[0] // 2, -img_scale[1] // 2)), dict( type='MixUp', img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0), dict(type='YOLOXHSVRandomAug'), dict(type='RandomFlip', flip_ratio=0.5),

According to the official implementation, multi-scale

# training is not considered here but in the
# 'mmdet/models/detectors/yolox.py'.
dict(type='Resize', img_scale=img_scale, keep_ratio=True),
dict(
    type='Pad',
    pad_to_square=True,
    # If the image is three-channel, the pad value needs
    # to be set separately for each channel.
    pad_val=dict(img=(114.0, 114.0, 114.0))),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])

]

dataset settings

data_root = '/mnt/122.149-sr01/liangyzh/dataset/coco/exam_0409_0418/' dataset_type = 'CocoDataset' classes = ('sit', 'stand')

train_dataset = dict( type='MultiImageMixDataset', dataset=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_train2017.json', img_prefix=data_root + 'train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True) ], filter_empty_gt=False, ), pipeline=train_pipeline)

test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=img_scale, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ]

data = dict( samples_per_gpu=3, workers_per_gpu=4, persistent_workers=True, train=train_dataset, val=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_val2017.json', img_prefix=data_root + 'val2017/', pipeline=test_pipeline), test=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_val2017.json', img_prefix=data_root + 'val2017/', pipeline=test_pipeline))

optimizer

default 8 gpu

optimizer = dict( type='SGD', lr=0.01, momentum=0.9, weight_decay=5e-4, nesterov=True, paramwise_cfg=dict(norm_decay_mult=0., bias_decay_mult=0.)) optimizer_config = dict(grad_clip=None)

max_epochs = 300 num_last_epochs = 15 resume_from = None interval = 10

learning policy

lr_config = dict( delete=True, policy='YOLOX', warmup='exp', by_epoch=False, warmup_by_epoch=True, warmup_ratio=1, warmup_iters=5, # 5 epoch num_last_epochs=num_last_epochs, min_lr_ratio=0.05)

runner = dict(type='EpochBasedRunner', max_epochs=max_epochs)

custom_hooks = [ dict( type='YOLOXModeSwitchHook', num_last_epochs=num_last_epochs, priority=48), dict( type='SyncNormHook', num_last_epochs=num_last_epochs, interval=interval, priority=48), dict( type='ExpMomentumEMAHook', resume_from=resume_from, momentum=0.0001, priority=49) ] checkpoint_config = dict(interval=interval) checkpoint_config = dict(create_symlink=False) #解决保存模型文件报错 evaluation = dict( save_best='auto',

The evaluation interval is 'interval' when running epoch is

# less than ‘max_epochs - num_last_epochs’.
# The evaluation interval is 1 when running epoch is greater than
# or equal to ‘max_epochs - num_last_epochs’.
interval=interval,
dynamic_intervals=[(max_epochs - num_last_epochs, 1)],
metric='bbox')

log_config = dict(interval=50)

auto_scale_lr = dict(base_batch_size=64)

load_from = "/mnt/122.149-sr01/liangyzh/dataset/mmlab/mmdetection/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth"

work_dir = "/mnt/122.149-sr01/liangyzh/dataset/mmlab/mmdetection/output/20221124"

Reproduces the problem - command or script

python demo/image_demo.py 1.png configs/yolox/yolox_x_3*4_exam.py /mnt/122.149-sr01/epoch_2.pth --out ./output/1_1.png

Reproduces the problem - error message

File "demo/image_demo.py", line 68, in main(args) File "demo/image_demo.py", line 36, in main result = inference_detector(model, args.img) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/apis/inference.py", line 151, in inference_detector results = model(return_loss=False, rescale=True, data) File "/home/gpu/1-Program/miniconda3/envs/yolo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/gpu/1-Program/miniconda3/envs/yolo/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func return old_func(args, kwargs) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/models/detectors/base.py", line 174, in forward return self.forward_test(img, img_metas, kwargs) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/models/detectors/base.py", line 147, in forward_test return self.simple_test(imgs[0], img_metas[0], kwargs) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/models/detectors/single_stage.py", line 101, in simple_test feat = self.extract_feat(img) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/models/detectors/single_stage.py", line 45, in extract_feat x = self.neck(x) File "/home/gpu/1-Program/miniconda3/envs/yolo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/122.149-sr01/liangyzh/projects/mmdetection/mmdet/models/necks/yolox_pafpn.py", line 139, in forward torch.cat([upsample_feat, feat_low], 1)) RuntimeError: Sizes of tensors must match except in dimension 2. Got 143 and 144 (The offending index is 0)

Additional information

I changed img_scale (640, 1280) in config file before training the yolox model. When I run image_demo.py by this model ,it occurs error.

RangiLyu commented 1 year ago

If you do not want to use a square input, you need to change the Pad in test_pipeline:

type='Pad',
pad_to_square=False,
size_divisor=32,
pad_val=dict(img=(114.0, 114.0, 114.0))),