Closed tikitong closed 2 years ago
@tikitong The setting should be evaluation = dict(interval=1, classwise=True, metric='bbox', save_best='auto')
and if the output is empty, the program will still report an error, and we will fix it as soon as possible. Thank you.
@hhaAndroid I did your change, but I am now erroring out validation post-epoch, whereas validation was successful before 🤔
Traceback (most recent call last):
File "tools/train.py", line 187, in <module>
main()
File "tools/train.py", line 183, in main
meta=meta)
File "/kaggle/working/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/kaggle/working/Swin-Transformer-Object-Detection/mmdet/core/evaluation/eval_hooks.py", line 149, in after_train_epoch
self.save_best_checkpoint(runner, key_score)
File "/kaggle/working/Swin-Transformer-Object-Detection/mmdet/core/evaluation/eval_hooks.py", line 166, in save_best_checkpoint
last_ckpt = runner.meta['hook_msgs']['last_ckpt']
KeyError: 'last_ckpt'
@tikitong
classes = ('circle', 'triangle', 'rectangle')
data = dict(
train=dict(dataset=dict(classes=classes)),
val=dict(classes=classes),
test=dict(classes=classes))
loss_cls=0, the reason may be that you are not passing the class to the dataset.
I also encountered this problem. How did you solve it?
I also encountered this problem. How did you solve it?
The issue can be resolved by https://github.com/open-mmlab/mmcv/pull/1398
I got the same problem running MaskRCNN with a custom datasaset written with VGG VIA and exported as COCO style:
sys.platform: linux Python: 3.8.10 (default, Jun 16 2021, 14:20:20) [GCC 9.3.0] CUDA available: True GPU 0: Tesla P100-PCIE-12GB CUDA_HOME: /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.1.1 NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (GCC) 9.3.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:
2022-02-09 17:36:56,669 - mmdet - INFO - Distributed training: False 2022-02-09 17:36:57,771 - mmdet - INFO - Config: model = dict( type='MaskRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=1, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/localscratch/felipe.26491647.0/datasets/salmons/COCO_annotations/Train_1.json', img_prefix= '/localscratch/felipe.26491647.0/datasets/salmons/images/All', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ], classes=['salmon']), val=dict( type='CocoDataset', ann_file= '/localscratch/felipe.26491647.0/datasets/salmons/COCO_annotations/Val_3.json', img_prefix= '/localscratch/felipe.26491647.0/datasets/salmons/images/All', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], classes=['salmon']), test=dict( type='CocoDataset', ann_file= '/localscratch/felipe.26491647.0/datasets/salmons/COCO_annotations/Val_3.json', img_prefix= '/localscratch/felipe.26491647.0/datasets/salmons/images/All', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], classes=['salmon'])) evaluation = dict(interval=1, metric=['bbox', 'segm']) optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = None runner = dict(type='EpochBasedRunner', max_epochs=6) checkpoint_config = dict(interval=1) log_config = dict(interval=1, hooks=[dict(type='TensorboardLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = '/home/felipe/projects/def-akhloufi/felipe/thesis/probando-funcionalidad/checkpoints/mask_rcnn_r50_fpn_1x_coco_pretrained.pth' resume_from = None workflow = [('train', 1), ('val', 1)] work_dir = '/localscratch/felipe.26491647.0/datasets/Coco/checkpoints/Experiment 1/MaskRCNN/Salmons/Resnet50/hyper_1/Checkpoint 2022-02-09@17:36:53' gpu_ids = range(0, 1)
2022-02-09 17:36:57,786 - mmdet - INFO - Set random seed to 1950499736, deterministic: False 2022-02-09 17:36:58,673 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'} 2022-02-09 17:36:58,674 - mmcv - INFO - load model from: torchvision://resnet50 2022-02-09 17:36:58,674 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50 2022-02-09 17:36:59,232 - mmcv - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
2022-02-09 17:36:59,266 - mmdet - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} 2022-02-09 17:36:59,291 - mmdet - INFO - initialize RPNHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01} 2022-02-09 17:36:59,296 - mmdet - INFO - initialize Shared2FCBBoxHead with init_cfg [{'type': 'Normal', 'std': 0.01, 'override': {'name': 'fc_cls'}}, {'type': 'Normal', 'std': 0.001, 'override': {'name': 'fc_reg'}}, {'type': 'Xavier', 'distribution': 'uniform', 'override': [{'name': 'shared_fcs'}, {'name': 'cls_fcs'}, {'name': 'reg_fcs'}]}] loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! fatal: not a git repository (or any parent up to mount point /) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). /project/6005433/felipe/thesis/env-mmdet/lib/python3.8/site-packages/torch/utils/data/dataloader.py:478: UserWarning: This DataLoader will create 2 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( loading annotations into memory... Done (t=0.01s) creating index... index created! 2022-02-09 17:37:11,856 - mmdet - INFO - load checkpoint from local path: /home/felipe/projects/def-akhloufi/felipe/thesis/probando-funcionalidad/checkpoints/mask_rcnn_r50_fpn_1x_coco_pretrained.pth 2022-02-09 17:37:12,296 - mmdet - WARNING - The model and loaded state dict do not match exactly
2022-02-09 17:37:12,306 - mmdet - INFO - workflow: [('train', 1), ('val', 1)], max: 6 epochs 2022-02-09 17:37:12,306 - mmdet - INFO - Checkpoints will be saved to /localscratch/felipe.26491647.0/datasets/Coco/checkpoints/Experiment 1/MaskRCNN/Salmons/Resnet50/hyper_1/Checkpoint 2022-02-09@17:36:53 by HardDiskBackend. /project/6005433/felipe/thesis/env-mmdet/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 2022-02-09 17:37:19,526 - mmdet - INFO - Saving checkpoint at 1 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 10.1 task/s, elapsed: 8s, ETA: 0s2022-02-09 17:37:28,753 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:37:28,754 - mmdet - ERROR - The testing results of the whole dataset is empty. 2022-02-09 17:37:45,671 - mmdet - INFO - Saving checkpoint at 2 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 9.8 task/s, elapsed: 9s, ETA: 0s2022-02-09 17:37:55,223 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:37:55,224 - mmdet - ERROR - The testing results of the whole dataset is empty. 2022-02-09 17:38:12,625 - mmdet - INFO - Saving checkpoint at 3 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 10.2 task/s, elapsed: 8s, ETA: 0s2022-02-09 17:38:21,799 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:38:21,800 - mmdet - ERROR - The testing results of the whole dataset is empty. 2022-02-09 17:38:38,849 - mmdet - INFO - Saving checkpoint at 4 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 10.4 task/s, elapsed: 8s, ETA: 0s2022-02-09 17:38:47,868 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:38:47,869 - mmdet - ERROR - The testing results of the whole dataset is empty. 2022-02-09 17:39:05,898 - mmdet - INFO - Saving checkpoint at 5 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 10.3 task/s, elapsed: 8s, ETA: 0s2022-02-09 17:39:15,076 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:39:15,078 - mmdet - ERROR - The testing results of the whole dataset is empty. 2022-02-09 17:39:32,140 - mmdet - INFO - Saving checkpoint at 6 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 86/86, 10.3 task/s, elapsed: 8s, ETA: 0s2022-02-09 17:39:41,219 - mmdet - INFO - Evaluating bbox... Loading and preparing results... 2022-02-09 17:39:41,220 - mmdet - ERROR - The testing results of the whole dataset is empty.
Evaluating bbox... Loading and preparing results... The testing results of the whole dataset is empty.
how I can solve it?
正在评估 bbox... 正在加载并准备结果... 整个数据集的测试结果为空。
我怎样才能解决它?
你好,这个问题解决了吗
正在评估 bbox... 正在加载并准备结果... 整个数据集的测试结果为空。 我怎样才能解决它?
你好,这个问题解决了吗
请问你解决了吗
The evaluation of the results does not take place. It seems that it can come from here:
evaluation = dict(interval=1, metric='bbox', classwise=True)
An error is also raised when I set the save_best argument like this:
evaluation = dict(interval=1, classwise=True, metric='bbox', save_best='bbox_mAP')
I'm using the same dataset with the same annotations in COCO format for models like VFNET and Faster-RCNN and it works fine. I don't see the error in my setting for YOLOX, do you have an idea? Thanks a lot!