open-mmlab / mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
https://mmyolo.readthedocs.io/zh_CN/dev/
GNU General Public License v3.0
2.98k stars 535 forks source link

for loop can not break out when enumerating data_list in dataset_analysis.py #435

Closed VoyagerXvoyagerx closed 1 year ago

VoyagerXvoyagerx commented 1 year ago

Prerequisite

🐞 Describe the bug

This for loop could not break out when all elements have been looped. When I modified the codes here as below:

    progress_bar = ProgressBar(len(dataset))
    cnt = 0
    print(len(data_list))
    for img in data_list:
        for instance in img['instances']:
            if instance[
                    'bbox_label'] in classes_idx and args.class_name is None:
                class_num[instance['bbox_label']] += 1
                class_bbox[instance['bbox_label']].append(instance['bbox'])
            elif instance['bbox_label'] in classes_idx and args.class_name:
                class_num[0] += 1
                class_bbox[0].append(instance['bbox'])
        progress_bar.update()
        cnt += 1
        if cnt == len(data_list):
            print('enumerate over!', '\n'*10)

Run the command

python tools/analysis_tools/dataset_analysis.py configs/custom_dataset/yolov5_s-v61_syncbn_fast_1xb32-50e_ionogram.py                                                 --output-dir output

I got output on the terminal:

loading annotations into memory...
Done (t=0.12s)
creating index...
index created!

Print current running information:
+--------------------------------------------------------------------+
|                        Dataset information                         |
+---------------+-------------+--------------+-----------------------+
|  Dataset type |  Class name |   Function   |       Area rule       |
+---------------+-------------+--------------+-----------------------+
| train_dataset | All classes | All function | [0, 32, 96, 100000.0] |
+---------------+-------------+--------------+-----------------------+

Read the information of each picture in the dataset:
[                                                  ] 0/3019, elapsed: 0s, ETA:3019
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 3019/3019, 11762.0 task/s, elapsed: 0s, ETA:     0senumerate over! 

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 7955/3019, 9119.1 task/s, elapsed: 1s, ETA:     0sTraceback (most recent call last):
  File "tools/analysis_tools/dataset_analysis.py", line 508, in <module>
    main()
  File "tools/analysis_tools/dataset_analysis.py", line 454, in main
    progress_bar.update()
  File "/home/ubuntu/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/progressbar.py", line 56, in update
    self.file.write(msg.format(bar_chars))
KeyboardInterrupt

I think my data set has no problems, because if I add a break:

        if cnt == len(data_list):
            print('enumerate over!', '\n'*10)
            break

The rest of codes runs successfully. I can also train on my custom dataset.

Environment

sys.platform: linux Python: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla V100-SXM2-32GB CUDA_HOME: :/usr/local/cuda-11.4:/usr/local/cuda-11.4 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.13.1 OpenCV: 4.6.0 MMEngine: 0.3.2 MMCV: 2.0.0rc3 MMDetection: 3.0.0rc4 MMYOLO: 0.2.0+27487fd

Additional information

my config file mmyolo/configs/custom_dataset/yolov5_s-v61_syncbn_fast_1xb32-50e_ionogram.py

_base_ = '../yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py'

max_epochs = 50  # 训练的最大 epoch
data_root = './Iono4311/'  # 数据集目录的绝对路径

work_dir = './work_dirs/yolov5_s_50e'

# 因为本教程是在 cat 数据集上微调,故这里需要使用 `load_from` 来加载 MMYOLO 中的预训练模型,这样可以在加快收敛速度的同时保证精度
load_from = './work_dirs/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth'  # noqa

# 根据自己的 GPU 情况,修改 batch size,YOLOv5-s 默认为 8卡 x 16bs
train_batch_size_per_gpu = 32
train_num_workers = 4  # 推荐使用 train_num_workers = nGPU x 4

save_epoch_intervals = 5  # 每 interval 轮迭代进行一次保存一次权重

# 根据自己的 GPU 情况,修改 base_lr,修改的比例是 base_lr_default * (your_bs 32 / default_bs (8x16))
base_lr = _base_.base_lr / 4

# anchors = [  # 此处已经根据数据集特点更新了 anchor,关于 anchor 的生成,后面小节会讲解
#     [(68, 69), (154, 91), (143, 162)],  # P3/8
#     [(242, 160), (189, 287), (391, 207)],  # P4/16
#     [(353, 337), (539, 341), (443, 432)]  # P5/32
# ]

anchors = [
    [[8, 6], [24, 4], [19, 9]],
    [[22, 19], [17, 49], [29, 45]],
    [[44, 66], [96, 76], [126, 59]]
]

class_name = ('E', 'Es-l', 'Es-c', 'F1', 'F2', 'Spread-F')  # 根据 class_with_id.txt 类别信息,设置 class_name
num_classes = len(class_name)

metainfo = dict(
    CLASSES = class_name,
    PALETTE = [(250, 165, 30), (120, 69, 125), (53, 125, 34), (0, 11, 123), (130, 20, 12), (120, 121, 80)]  # 画图时候的颜色,随便设置即可
)

train_cfg = dict(
    max_epochs=max_epochs,
    val_begin=10,  # 第几个 epoch 后验证,这里设置 20 是因为前 20 个 epoch 精度不高,测试意义不大,故跳过
    val_interval=save_epoch_intervals  # 每 val_interval 轮迭代进行一次测试评估
)

model = dict(
    bbox_head=dict(
        head_module=dict(num_classes=num_classes),
        prior_generator=dict(base_sizes=anchors),

        # loss_cls 会根据 num_classes 动态调整,但是 num_classes = 1 的时候,loss_cls 恒为 0
        loss_cls=dict(loss_weight=0.5 *
                      (num_classes / 80 * 3 / _base_.num_det_layers))))

train_dataloader = dict(
    batch_size=train_batch_size_per_gpu,
    num_workers=train_num_workers,
    dataset=dict(
        _delete_=True,
        type='RepeatDataset',
        # 数据量太少的话,可以使用 RepeatDataset ,在每个 epoch 内重复当前数据集 n 次,这里设置 5 是重复 5 次
        times=1,
        dataset=dict(
            type=_base_.dataset_type,
            data_root=data_root,
            metainfo=metainfo,
            ann_file='annotations/train.json',
            data_prefix=dict(img='images/'),
            filter_cfg=dict(filter_empty_gt=False, min_size=32),
            pipeline=_base_.train_pipeline)))

val_dataloader = dict(
    dataset=dict(
        metainfo=metainfo,
        data_root=data_root,
        ann_file='annotations/val.json',
        data_prefix=dict(img='images/')))

test_dataloader = val_dataloader

val_evaluator = dict(ann_file=data_root + 'annotations/val.json')
test_evaluator = val_evaluator

optim_wrapper = dict(optimizer=dict(lr=base_lr))

default_hooks = dict(
    # 设置间隔多少个 epoch 保存模型,以及保存模型最多几个,`save_best` 是另外保存最佳模型(推荐)
    checkpoint=dict(
        type='CheckpointHook',
        interval=save_epoch_intervals,
        max_keep_ckpts=5,
        save_best='auto'),
    param_scheduler=dict(max_epochs=max_epochs),
    # logger 输出的间隔 (每个batch)
    logger=dict(type='LoggerHook', interval=50))

visualizer = dict(vis_backends=[dict(type='LocalVisBackend'), dict(type='WandbVisBackend')])
# visualizer = dict(vis_backends=[dict(type='LocalVisBackend'),dict(type='TensorboardVisBackend')])
hhaAndroid commented 1 year ago

@VoyagerXvoyagerx Sorry. This bug has been fixed in the dev branch, and we will release v0.3.0 soon

https://github.com/open-mmlab/mmyolo/blob/dev/tools/analysis_tools/dataset_analysis.py#L436

VoyagerXvoyagerx commented 1 year ago

Looking forward : D