Closed sarmientoj24 closed 2 years ago
I am trying to use YoloX as one of the architectures for Object Detection. But compared to other architectures, I am having really bad scores. Any suggestions why?
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 57.5 task/s, elapsed: 6s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 10:12:17,974 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 1792 | 0.080 | 0.002 | | pa | 172 | 535 | 0.151 | 0.007 | +--------+-----+------+--------+-------+ | mAP | | | | 0.005 | +--------+-----+------+--------+-------+ 2022-03-10 10:12:17,975 - mmdet - INFO - Epoch(val) [97][351] AP50: 0.0050, mAP: 0.0047 2022-03-10 10:12:28,027 - mmdet - INFO - Epoch [98][50/208] lr: 6.250e-05, eta: 0:01:40, time: 0.201, data_time: 0.057, memory: 6214, loss_cls: 0.6502, loss_bbox: 2.8659, loss_obj: 2.9006, loss_l1: 0.9162, loss: 7.3329 2022-03-10 10:12:35,786 - mmdet - INFO - Epoch [98][100/208] lr: 6.250e-05, eta: 0:01:31, time: 0.155, data_time: 0.010, memory: 6214, loss_cls: 0.6269, loss_bbox: 2.7124, loss_obj: 1.7779, loss_l1: 0.7883, loss: 5.9056 2022-03-10 10:12:43,249 - mmdet - INFO - Epoch [98][150/208] lr: 6.250e-05, eta: 0:01:22, time: 0.149, data_time: 0.009, memory: 6214, loss_cls: 0.6364, loss_bbox: 2.8063, loss_obj: 2.0539, loss_l1: 0.8438, loss: 6.3405 2022-03-10 10:12:50,357 - mmdet - INFO - Epoch [98][200/208] lr: 6.250e-05, eta: 0:01:13, time: 0.142, data_time: 0.009, memory: 6214, loss_cls: 0.6495, loss_bbox: 2.8711, loss_obj: 2.4804, loss_l1: 0.8849, loss: 6.8859 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 57.5 task/s, elapsed: 6s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 10:12:58,097 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 1771 | 0.080 | 0.002 | | pa | 172 | 525 | 0.151 | 0.008 | +--------+-----+------+--------+-------+ | mAP | | | | 0.005 | +--------+-----+------+--------+-------+ 2022-03-10 10:12:58,099 - mmdet - INFO - Epoch(val) [98][351] AP50: 0.0050, mAP: 0.0048 2022-03-10 10:13:08,456 - mmdet - INFO - Epoch [99][50/208] lr: 6.250e-05, eta: 0:01:03, time: 0.207, data_time: 0.056, memory: 6214, loss_cls: 0.6240, loss_bbox: 2.6947, loss_obj: 1.9800, loss_l1: 0.8009, loss: 6.0996 2022-03-10 10:13:16,377 - mmdet - INFO - Epoch [99][100/208] lr: 6.250e-05, eta: 0:00:55, time: 0.158, data_time: 0.009, memory: 6214, loss_cls: 0.6312, loss_bbox: 2.7407, loss_obj: 1.6332, loss_l1: 0.8054, loss: 5.8105 2022-03-10 10:13:24,525 - mmdet - INFO - Epoch [99][150/208] lr: 6.250e-05, eta: 0:00:46, time: 0.163, data_time: 0.009, memory: 6214, loss_cls: 0.6387, loss_bbox: 2.7241, loss_obj: 2.0790, loss_l1: 0.8505, loss: 6.2923 2022-03-10 10:13:32,176 - mmdet - INFO - Epoch [99][200/208] lr: 6.250e-05, eta: 0:00:37, time: 0.153, data_time: 0.009, memory: 6214, loss_cls: 0.6375, loss_bbox: 2.7981, loss_obj: 2.1879, loss_l1: 0.8473, loss: 6.4708 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 57.2 task/s, elapsed: 6s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 10:13:39,909 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 1751 | 0.084 | 0.002 | | pa | 172 | 514 | 0.151 | 0.008 | +--------+-----+------+--------+-------+ | mAP | | | | 0.005 | +--------+-----+------+--------+-------+ 2022-03-10 10:13:39,910 - mmdet - INFO - Epoch(val) [99][351] AP50: 0.0050, mAP: 0.0050 2022-03-10 10:13:50,469 - mmdet - INFO - Epoch [100][50/208] lr: 6.250e-05, eta: 0:00:27, time: 0.211, data_time: 0.057, memory: 6214, loss_cls: 0.6431, loss_bbox: 2.8128, loss_obj: 2.2370, loss_l1: 0.8816, loss: 6.5745 2022-03-10 10:13:58,238 - mmdet - INFO - Epoch [100][100/208] lr: 6.250e-05, eta: 0:00:18, time: 0.155, data_time: 0.009, memory: 6214, loss_cls: 0.6229, loss_bbox: 2.6683, loss_obj: 1.8179, loss_l1: 0.7822, loss: 5.8914 2022-03-10 10:14:05,879 - mmdet - INFO - Epoch [100][150/208] lr: 6.250e-05, eta: 0:00:10, time: 0.153, data_time: 0.009, memory: 6214, loss_cls: 0.6315, loss_bbox: 2.7117, loss_obj: 1.8303, loss_l1: 0.7885, loss: 5.9621 2022-03-10 10:14:12,759 - mmdet - INFO - Epoch [100][200/208] lr: 6.250e-05, eta: 0:00:01, time: 0.138, data_time: 0.009, memory: 6214, loss_cls: 0.6424, loss_bbox: 2.8237, loss_obj: 2.3767, loss_l1: 0.8779, loss: 6.7207
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 40.0 task/s, elapsed: 9s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 08:00:48,057 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 3368 | 0.633 | 0.361 | | pa | 172 | 514 | 0.657 | 0.463 | +--------+-----+------+--------+-------+ | mAP | | | | 0.412 | +--------+-----+------+--------+-------+ 2022-03-10 08:00:48,058 - mmdet - INFO - Epoch(val) [99][351] AP50: 0.4120, mAP: 0.4119 2022-03-10 08:01:12,034 - mmdet - INFO - Epoch [100][50/104] lr: 1.250e-05, eta: 0:00:23, time: 0.479, data_time: 0.095, memory: 5343, loss_cls: 0.1659, loss_bbox: 0.2610, loss: 0.4270, grad_norm: 5.2169 2022-03-10 08:01:34,911 - mmdet - INFO - Epoch [100][100/104] lr: 1.250e-05, eta: 0:00:01, time: 0.458, data_time: 0.035, memory: 5343, loss_cls: 0.1818, loss_bbox: 0.2742, loss: 0.4560, grad_norm: 5.6148 2022-03-10 08:01:36,928 - mmdet - INFO - Saving checkpoint at 100 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 25.2 task/s, elapsed: 14s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 08:01:52,669 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 3298 | 0.635 | 0.363 | | pa | 172 | 495 | 0.680 | 0.496 | +--------+-----+------+--------+-------+ | mAP | | | | 0.429 | +--------+-----+------+--------+-------+ 2022-03-10 08:01:52,672 - mmdet - INFO - Epoch(val) [100][351] AP50: 0.4290, mAP: 0.4294
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 20.2 task/s, elapsed: 17s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 08:20:16,910 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 898 | 0.504 | 0.326 | | pa | 172 | 264 | 0.610 | 0.464 | +--------+-----+------+--------+-------+ | mAP | | | | 0.395 | +--------+-----+------+--------+-------+ 2022-03-10 08:20:16,913 - mmdet - INFO - Epoch(val) [99][351] AP50: 0.3950, mAP: 0.3950 2022-03-10 08:21:14,781 - mmdet - INFO - Epoch [100][50/104] lr: 2.500e-05, eta: 0:00:56, time: 1.155, data_time: 0.108, memory: 7542, loss_rpn_cls: 0.0114, loss_rpn_bbox: 0.0054, s0.loss_cls: 0.0579, s0.acc: 97.7305, s0.loss_bbox: 0.0477, s1.loss_cls: 0.0298, s1.acc: 97.6778, s1.loss_bbox: 0.0626, s2.loss_cls: 0.0156, s2.acc: 97.4595, s2.loss_bbox: 0.0350, loss: 0.2655 2022-03-10 08:22:07,842 - mmdet - INFO - Epoch [100][100/104] lr: 2.500e-05, eta: 0:00:04, time: 1.061, data_time: 0.033, memory: 7542, loss_rpn_cls: 0.0145, loss_rpn_bbox: 0.0062, s0.loss_cls: 0.0560, s0.acc: 97.8596, s0.loss_bbox: 0.0450, s1.loss_cls: 0.0293, s1.acc: 97.7130, s1.loss_bbox: 0.0600, s2.loss_cls: 0.0152, s2.acc: 97.4996, s2.loss_bbox: 0.0330, loss: 0.2593 2022-03-10 08:22:12,175 - mmdet - INFO - Saving checkpoint at 100 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 351/351, 24.6 task/s, elapsed: 14s, ETA: 0s ---------------iou_thr: 0.5--------------- 2022-03-10 08:22:28,972 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 879 | 0.496 | 0.322 | | pa | 172 | 262 | 0.616 | 0.466 | +--------+-----+------+--------+-------+ | mAP | | | | 0.394 | +--------+-----+------+--------+-------+ 2022-03-10 08:22:28,974 - mmdet - INFO - Epoch(val) [100][351] AP50: 0.3940, mAP: 0.3940
2022-03-10 07:06:27,541 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 989 | 0.551 | 0.349 | | pa | 172 | 306 | 0.587 | 0.439 | +--------+-----+------+--------+-------+ | mAP | | | | 0.394 | +--------+-----+------+--------+-------+ 2022-03-10 07:06:27,543 - mmdet - INFO - Epoch(val) [99][351] AP50: 0.3940, mAP: 0.3941 2022-03-10 07:07:07,670 - mmdet - INFO - Epoch [100][50/104] lr: 2.500e-05, eta: 0:00:36, time: 0.800, data_time: 0.104, memory: 6896, loss_rpn_cls: 0.0099, loss_rpn_bbox: 0.0093, loss_cls: 0.0730, acc: 97.1970, loss_bbox: 0.1103, loss: 0.2025 2022-03-10 07:07:44,571 - mmdet - INFO - Epoch [100][100/104] lr: 2.500e-05, eta: 0:00:02, time: 0.738, data_time: 0.031, memory: 6896, loss_rpn_cls: 0.0125, loss_rpn_bbox: 0.0102, loss_cls: 0.0710, acc: 97.3193, loss_bbox: 0.1049, loss: 0.1986 2022-03-10 07:07:47,327 - mmdet - INFO - Saving checkpoint at 100 epochs [>>>>>>>>>> ] 74/351, 25.9 task/s, elapsed: 3s, ETA: 11s0s [>>>>>>>>>>>>>>>>>>>> ] 145/351, 29.6 task/s, elapsed: 5s, ETA: 7ss [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 213/351, 30.8 task/s, elapsed: 7s, ETA: 4ss [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 282/351, 31.4 task/s, elapsed: 9s, ETA: 2ss [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 345/351, 31.4 task/s, elapsed: 11s, ETA: 0s ---------------iou_thr: 0.5--------------->>>>>>>>>] 351/351, 31.4 task/s, elapsed: 11s, ETA: 0s ---------------iou_thr: 0.5--------------->>>>>>>>>] 351/351, 31.4 task/s, elapsed: 11s, ETA: 0s ---------------iou_thr: 0.5--------------->>>>>>>>>] 351/351, 31.4 task/s, elapsed: 11s, ETA: 0s 2022-03-10 07:08:01,956 - mmdet - INFO - +--------+-----+------+--------+-------+ | class | gts | dets | recall | ap | +--------+-----+------+--------+-------+ | cavity | 490 | 966 | 0.555 | 0.352 | | pa | 172 | 304 | 0.587 | 0.438 | +--------+-----+------+--------+-------+ | mAP | | | | 0.395 | +--------+-----+------+--------+-------+ 2022-03-10 07:08:01,957 - mmdet - INFO - Epoch(val) [100][351] AP50: 0.3950, mAP: 0.3947
Config: optimizer = dict( type='SGD', lr=0.00125, momentum=0.9, weight_decay=0.0005, nesterov=True, paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='YOLOX', warmup='exp', by_epoch=False, warmup_by_epoch=True, warmup_ratio=1, warmup_iters=5, num_last_epochs=15, min_lr_ratio=0.05) runner = dict(type='EpochBasedRunner', max_epochs=100) checkpoint_config = dict(interval=20) log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict( type='WandbLoggerHook', init_kwargs=dict( project='adra_bipa_mmdet', name='yolox-512-conf-001-noaugs', config=dict(work_dirs='./logs_bipa_yoloxv1')), by_epoch=True) ]) custom_hooks = [ dict(type='YOLOXModeSwitchHook', num_last_epochs=15, priority=48), dict(type='SyncNormHook', num_last_epochs=15, interval=10, priority=48), dict( type='ExpMomentumEMAHook', resume_from=None, momentum=0.0001, priority=49) ] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = 'checkpoints/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth' resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' img_scale = (512, 512) model = dict( type='YOLOX', input_size=(640, 640), random_size_range=(15, 25), random_size_interval=10, backbone=dict(type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5), neck=dict( type='YOLOXPAFPN', in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1), bbox_head=dict( type='YOLOXHead', num_classes=2, in_channels=128, feat_channels=128), train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)), test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.4))) data_root = 'dataset/bipa' dataset_type = 'CocoDataset' train_pipeline = [ dict(type='Mosaic', img_scale=(512, 512), pad_val=114.0), dict( type='RandomAffine', scaling_ratio_range=(0.1, 2), border=(-256, -256)), dict( type='MixUp', img_scale=(512, 512), ratio_range=(0.8, 1.6), pad_val=114.0), dict(type='YOLOXHSVRandomAug'), dict(type='RandomFlip', flip_ratio=0.0), dict(type='Resize', img_scale=(512, 512), keep_ratio=True), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] train_dataset = dict( type='MultiImageMixDataset', dataset=dict( type='BIPADataset', ann_file='dataset/bipa/annotations/train.json', img_prefix='dataset/bipa/train', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True) ], filter_empty_gt=False), pipeline=[ dict(type='Mosaic', img_scale=(512, 512), pad_val=114.0), dict( type='RandomAffine', scaling_ratio_range=(0.1, 2), border=(-256, -256)), dict( type='MixUp', img_scale=(512, 512), ratio_range=(0.8, 1.6), pad_val=114.0), dict(type='YOLOXHSVRandomAug'), dict(type='RandomFlip', flip_ratio=0.0), dict(type='Resize', img_scale=(512, 512), keep_ratio=True), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict( type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]) test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(512, 512), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=8, persistent_workers=True, train=dict( type='MultiImageMixDataset', dataset=dict( type='BIPADataset', ann_file='dataset/bipa/annotations/train.json', img_prefix='dataset/bipa/train', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True) ], filter_empty_gt=False), pipeline=[ dict(type='Mosaic', img_scale=(512, 512), pad_val=114.0), dict( type='RandomAffine', scaling_ratio_range=(0.1, 2), border=(-256, -256)), dict( type='MixUp', img_scale=(512, 512), ratio_range=(0.8, 1.6), pad_val=114.0), dict(type='YOLOXHSVRandomAug'), dict(type='RandomFlip', flip_ratio=0.0), dict(type='Resize', img_scale=(512, 512), keep_ratio=True), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict( type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='BIPADataset', ann_file='dataset/bipa/annotations/val.json', img_prefix='dataset/bipa/val', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(512, 512), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='BIPADataset', ann_file='dataset/bipa/annotations/val.json', img_prefix='dataset/bipa/val', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(512, 512), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Pad', pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ])) max_epochs = 100 num_last_epochs = 5 interval = 10 evaluation = dict(interval=1, dynamic_intervals=[(285, 1)], metric='mAP') albu_train_transforms = [ dict( type='ShiftScaleRotate', shift_limit=0.0625, scale_limit=0.2, rotate_limit=45, interpolation=1, p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=[-0.15, 0.15], contrast_limit=[-0.15, 0.15], p=0.7), dict(type='ImageCompression', quality_lower=85, quality_upper=95, p=0.2), dict( type='OneOf', transforms=[ dict(type='Blur', blur_limit=3, p=1.0), dict(type='MedianBlur', blur_limit=3, p=1.0), dict(type='GaussianBlur', blur_limit=3, p=1.0) ], p=0.25), dict( type='OneOf', transforms=[dict(type='Sharpen', p=1.0), dict(type='Emboss', p=1.0)], p=0.25), dict(type='GaussNoise', var_limit=[20.0, 80.0], per_channel=False, p=0.3), dict(type='HorizontalFlip', p=0.5), dict(type='RandomRotate90', p=0.3), dict(type='Transpose', p=0.2) ] classes = ('cavity', 'pa') CLASSES = ('cavity', 'pa') work_dir = './logs_bipa_yoloxv1' seed = 0 gpu_ids = [3]
YOLOX uses extremely heavy data augmentation which is adjusted on the coco dataset. And its training pipeline may not be suitable for other datasets.
I am trying to use YoloX as one of the architectures for Object Detection. But compared to other architectures, I am having really bad scores. Any suggestions why?
yolox (last 2 epochs)
retinanet
Cascade RCNN
Faster RCNN
YOLOX Config