open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.21k stars 9.4k forks source link

Load train_dataloader and val_dataloader #171

Closed whikwon closed 4 years ago

whikwon commented 5 years ago

How can I see the proceeding status (validation) during training?

I added workflow = [('train, 1), ('val', 1)] to config file and executed python tools/train.py configs/faster_rcnn_r50_fpn_1x.py

Error below occurred assert len(data_loaders) == len(workflow)

I think handling only train_dataset in the tools/train.py script make this error.

How can I use validation dataset?

thangvubk commented 5 years ago

You should you argument --validate instead. Pls refer to readme file for details

hellock commented 5 years ago

For validation there are two options. One is to show the loss on validation set, another is to evaluate the mAP on validation set. In mmdetection we adopt the second one and use eval hooks to implement it, see here for details.

If you want to adopt the first option and specify two phases in the workflow, two dataloaders are needed. Two minor modifications are needed.

  1. tools/train.py L70-L77 as follows.
    train_dataset = get_dataset(cfg.data.train)
    val_dataset = get_dataset(cfg.data.val)
    train_detector(
    model,
    [train_dataset, val_dataset],
    cfg,
    distributed=distributed,
    validate=args.validate,
    logger=logger)
  2. mmdet/apis/train.py L64-L70
    data_loaders = [
        build_dataloader(
            ds,
            cfg.data.imgs_per_gpu,
            cfg.data.workers_per_gpu,
            dist=True)
        for ds in dataset
    ]
whikwon commented 5 years ago

@hellock I tried 1st option and got error:


  File "tools/train.py", line 82, in <module>
    main()
  File "tools/train.py", line 78, in main
    logger=logger)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/apis/train.py", line 59, in train_detector
    _non_dist_train(model, dataset, cfg, validate=validate)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/apis/train.py", line 107, in _non_dist_train
    dist=False)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/datasets/loader/build_loader.py", line 31, in build_dataloader
    sampler = GroupSampler(dataset, imgs_per_gpu)
  File```
hellock commented 5 years ago

I updated the code snippets above.

whikwon commented 5 years ago

I've followed your instruction and got error

  1. Modify tools/train.py L70-L77, configs/faster_rcnn_r50_fpn_1x.py L156 in local repository
train_dataset = get_dataset(cfg.data.train)
val_dataset = get_dataset(cfg.data.val)
train_detector(
    model,
    [train_dataset, val_dataset],
    cfg,
    distributed=distributed,
    validate=args.validate,
    logger=logger)
workflow = [('train', 1), ('val', 1)]
  1. Modify mmdet/apis/train.py L64-L70 in anaconda package
    data_loaders = [
        build_dataloader(
            ds,
            cfg.data.imgs_per_gpu,
            cfg.data.workers_per_gpu,
            dist=True)
        for ds in dataset
    ]

Error:

Traceback (most recent call last):
  File "tools/train.py", line 81, in <module>
    main()
  File "tools/train.py", line 77, in main
    logger=logger)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/apis/train.py", line 59, in train_detector
    _non_dist_train(model, dataset, cfg, validate=validate)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/apis/train.py", line 109, in _non_dist_train
    for ds in dataset
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/apis/train.py", line 109, in <listcomp>
    for ds in dataset
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/datasets/loader/build_loader.py", line 31, in build_dataloader
    sampler = GroupSampler(dataset, imgs_per_gpu)
  File "/home/whikwon/anaconda3/envs/biorobot/lib/python3.6/site-packages/mmdet-0.5.4+c95c637-py3.6.egg/mmdet/datasets/loader/sampler.py", line 14, in __init__
    assert hasattr(dataset, 'flag')
AssertionError

I've ran with train dataset and val dataset seperately for train and it worked. (not problem from val dataset.)

hellock commented 5 years ago

How about your config file? Are you setting test_mode=True for val dataset? It needs to be False just like train dataset.

whikwon commented 5 years ago

My config file.

# model settings
model = dict(
    type='FasterRCNN',
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=81,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=False))
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        smoothl1_beta=1 / 9.0,
        debug=False),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
)
# dataset settings
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=[
            data_root + 'VOC2012/ImageSets/Main/train.txt',
        ],
        img_prefix=[data_root + 'VOC2012/'],
        img_scale=(1000, 600),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'VOC2012/ImageSets/Main/val.txt',
        img_prefix=data_root + 'VOC2012/',
        img_scale=(1000, 600),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
        img_prefix=data_root + 'VOC2007/',
        img_scale=(1000, 600),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=False))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 2
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1), ('val', 1)]

valid dataset has no test_mode key.

dhananjaisharma10 commented 5 years ago

How about your config file? Are you setting test_mode=True for val dataset? It needs to be False just like train dataset.

Hi!

I'm using the config file for retinanet_r50_fpn_1x. Isn't test_mode automatically set as True on line 25 in mmdet/core/evaluation/eval_hooks.py for validation? Also, if test_mode=False, it throws an error, because the image is treated as a tensor, whereas a list of images is expected:

TypeError: imgs must be a list, but got <class 'torch.Tensor'>

I'm not sure what is the difference if I explicitly state test_mode=True or not in the validation section of my config file.

Please let me know. Thanks!

Schneey commented 4 years ago

@dhananjaisharma10 I have the same problem as you Have you solved it?

hellock commented 4 years ago

1093

bnumaomei commented 4 years ago

@hellock

For validation there are two options. One is to show the loss on validation set, another is to evaluate the mAP on validation set. In mmdetection we adopt the second one and use eval hooks to implement it, see here for details.

Now in mmdetection show the loss on validation set. Can I use mAP on validation set? What can I modify ?