shinya7y / UniverseNet

USB: Universal-Scale Object Detection Benchmark (BMVC 2022)
Apache License 2.0
422 stars 54 forks source link

Train other datasets #27

Closed sure7018 closed 2 years ago

sure7018 commented 3 years ago

Hello, when I was training my own data set in coco format, because my data set did not have a mask, I put if Ann ['area '] < = 0 or W < 1 or H < 1 in line 133 of mmdet.datasets.coco.py change to if w < 1 or H < 1, but there are the following errors:

2021-10-03 16:08:17,352 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration. 2021-10-03 16:11:31,063 - mmdet - INFO - Epoch [1][50/367] lr: 9.890e-04, eta: 8:37:53, time: 4.262, data_time: 0.331, memory: 9512, kpt_loss_point_cls: 1.1301, kpt_loss_point_offset: 0.0854, bbox_loss_cls: 1.1355, bbox_loss_bbox: 0.6738, loss: 3.0248 2021-10-03 16:14:54,468 - mmdet - INFO - Epoch [1][100/367] lr: 1.988e-03, eta: 8:22:37, time: 4.068, data_time: 0.006, memory: 9512, kpt_loss_point_cls: 0.9700, kpt_loss_point_offset: 0.0889, bbox_loss_cls: 1.1190, bbox_loss_bbox: 0.6012, loss: 2.7791 2021-10-03 16:18:09,565 - mmdet - INFO - Epoch [1][150/367] lr: 2.987e-03, eta: 8:08:37, time: 3.902, data_time: 0.007, memory: 9512, kpt_loss_point_cls: 0.8648, kpt_loss_point_offset: 0.0878, bbox_loss_cls: 1.1331, bbox_loss_bbox: 0.5795, loss: 2.6652 2021-10-03 16:21:27,164 - mmdet - INFO - Epoch [1][200/367] lr: 3.986e-03, eta: 8:01:29, time: 3.952, data_time: 0.007, memory: 9541, kpt_loss_point_cls: 0.7960, kpt_loss_point_offset: 0.0863, bbox_loss_cls: 1.1200, bbox_loss_bbox: 0.5794, loss: 2.5817 2021-10-03 16:24:47,569 - mmdet - INFO - Epoch [1][250/367] lr: 4.985e-03, eta: 7:57:13, time: 4.008, data_time: 0.007, memory: 9541, kpt_loss_point_cls: 0.8372, kpt_loss_point_offset: 0.0843, bbox_loss_cls: 1.1260, bbox_loss_bbox: 0.5690, loss: 2.6165 2021-10-03 16:28:09,630 - mmdet - INFO - Epoch [1][300/367] lr: 5.984e-03, eta: 7:53:54, time: 4.041, data_time: 0.007, memory: 9541, kpt_loss_point_cls: 0.6968, kpt_loss_point_offset: 0.0834, bbox_loss_cls: 1.1199, bbox_loss_bbox: 0.5709, loss: 2.4710 2021-10-03 16:31:30,102 - mmdet - INFO - Epoch [1][350/367] lr: 6.983e-03, eta: 7:50:03, time: 4.009, data_time: 0.006, memory: 10288, kpt_loss_point_cls: 0.8081, kpt_loss_point_offset: 0.0854, bbox_loss_cls: 1.1074, bbox_loss_bbox: inf, loss: inf 2021-10-03 16:32:38,249 - mmdet - INFO - Saving checkpoint at 1 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 420/420, 5.1 task/s, elapsed: 83s, ETA: 0s

2021-10-03 16:34:11,594 - mmdet - INFO - Evaluating bbox... Loading and preparing results... DONE (t=0.14s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=2.98s). Accumulating evaluation results... DONE (t=1.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.004 2021-10-03 16:34:15,975 - mmdet - INFO - Exp name: bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py 2021-10-03 16:34:15,975 - mmdet - INFO - Epoch(val) [1][210] bbox_mAP: 0.0000, bbox_mAP_50: 0.0000, bbox_mAP_75: 0.0000, bbox_mAP_s: 0.0000, bbox_mAP_m: 0.0000, bbox_mAP_l: 0.0000, bbox_mAP_copypaste: 0.000 0.000 0.000 0.000 0.000 0.000

How should I modify it, My training model is configs/bvr/bvr_retinanet_x101_fpn_dcn_mstrain_400_1200_20e_coco.py.

sure7018 commented 3 years ago

If I do not have the above modifications, an error will be reported: [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 5.1 task/s, elapsed: 978s, ETA: 0s Evaluating bbox... Loading and preparing results... The testing results of the whole dataset is empty.

shinya7y commented 3 years ago

Object instance annotations in COCO format should contain the area field. Please modify your json files by calculating width * height to assign area.

the following errors

What do the errors mean? Low AP and nan (loss: inf) may be caused by training instability (e.g., too high learning rate).

I recommend using simple detectors (e.g., retinanet_r50_fpn_1x_coco.py) to debug your dataset before trying recent detectors. Especially, detectors not supported in the original mmdetection have not been verified by many users.

sure7018 commented 3 years ago

Hello, if the dataset I use is marked in Chinese, how can I modify the dataset to read Chinese text?

shinya7y commented 3 years ago

I haven't tried category names written in Chinese. Are there any errors?

sure7018 commented 3 years ago

I just changed Chinese into English, and found or reported the same error, as shown below. Should it be the problem of my data set? What do you think?

2021-10-09 16:31:16,491 - mmdet - INFO - Saving checkpoint at 1 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 7320/7313, 65.4 task/s, elapsed: 112s, ETA: 0s

2021-10-09 16:37:00,650 - mmdet - INFO - Evaluating bbox... Loading and preparing results... DONE (t=0.52s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox Traceback (most recent call last): File "tools/train.py", line 189, in main() File "tools/train.py", line 185, in main meta=meta) File "/root/lws/UniverseNet/mmdet/apis/train.py", line 212, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py", line 237, in after_train_epoch self._do_evaluate(runner) File "/root/lws/UniverseNet/mmdet/core/evaluation/eval_hooks.py", line 58, in _do_evaluate key_score = self.evaluate(runner, results) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py", line 325, in evaluate results, logger=runner.logger, self.eval_kwargs) File "/root/lws/UniverseNet/mmdet/datasets/coco.py", line 497, in evaluate cocoEval.evaluate() File "/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py", line 149, in evaluate for imgId in p.imgIds File "/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py", line 150, in for catId in catIds} File "/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py", line 188, in computeIoU iscrowd = [int(o['iscrowd']) for o in gt] File "/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py", line 188, in iscrowd = [int(o['iscrowd']) for o in gt] KeyError: 'iscrowd' Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in main() File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'tools/train.py', '--local_rank=7', 'configs/faster_rcnn/faster_rcnn_r101_fpn_1x_coco.py', '--launcher', 'pytorch']' returned non-zero exit status 1. root@worker02:~/lws/UniverseNet#

sure7018 commented 3 years ago

Thank you for your reply. I think I have found the error