open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.41k stars 9.43k forks source link

How to optimize hyper parameters using some frame works such as optuna? #6158

Closed AstroYuta closed 3 years ago

AstroYuta commented 3 years ago

Hi forks! Thanks for developing this inspiring project!

I am currently working on instance segmentations using Mask R-CNN or Cascade R-CNN, and very new to this field. Note that my project is trying to identify thousands (>2000) rocks in an image.

I would like to optimize hyper parameters by using frameworks such as optuna (https://github.com/optuna/optuna), but I cannot figure out how to implement those frameworks. Especially, how do you get losses (ex. val/loss) or accuracy in each epoch?

My provisional ways to optimize according to some documents (ex. https://medium.com/pytorch/using-optuna-to-optimize-pytorch-hyperparameters-990607385e36) are:

1. edit hyper parameters in config files using trial class objects of optuna In /mmdetection/configs/base/models/mask_rcnn_r50_fpn.py

before
...
rpn_proposal=dict(
            nms_pre=4000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0)
...
after
...
rpn_proposal=dict(
            nms_pre=trial.suggest_uniform('nms_pre', 1000, 4000), <= Changed
            max_per_img=trial.suggest_uniform('max_per_img', 1000, 4000), <= Changed
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0)

2. Return loss values or accuracy in each epoch (How??) In tools/train.py

def objective(trial): <= Changed from train method
...
    train_detector(
            model,
            datasets,
            cfg,
            distributed=distributed,
            validate=(not args.no_validate),
            timestamp=timestamp,
            meta=meta)
    return accuracy <= Added.
...

3. Run the trials and obtain best hyper parameters In tools/train.py

import optuna
...
if __name__ == '__main__':
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100)

At last, my questions are:

  1. Is it possible to get loss values or accuracy in each epoch? Where can i get them?
  2. If you have any advice to the usage of optuna, please let me know.
  3. If you know some frameworks for optimizing hyper parameters working very well with MMDetection rather than optuna, please let me know.

Thanks!

BIGWangYuDong commented 3 years ago

I haven't try optuna or other frameworks before. In my opinion, changes in config file is not correct, config file is used to build models and other important things, maybe you can try to use cfg.xxx in tools/train.py

AstroYuta commented 3 years ago

@BIGWangYuDong Great thanks for the reply! It took me several days to understand scripts, and finally I found a solution to edit train_detector in mmdet/apis/train.py to return runner, so that I get dict of "loss" and "log_vars" by runner.outputs.

In this way, my modifications in tools/train.py are the following (cascade rcnn):

def objective(trial): # edited from main()
...
    cfg.optimizer.lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
...
    # add an attribute for visualization convenience
    model.CLASSES = datasets[0].CLASSES
    runner = train_detector(
        model,
        datasets,
        cfg,
        distributed=distributed,
        validate=(not args.no_validate),
        timestamp=timestamp,
        meta=meta)
    outputs = runner.outputs <-- HERE
    accuracy = (outputs["log_vars"]["s0.acc"] + outputs["log_vars"]["s1.acc"] + outputs["log_vars"]["s2.acc"]) / 3
    return accuracy

if __name__ == '__main__':
    # main()
    study_name = "study_name"
    study = optuna.create_study(direction="maximize", study_name=study_name, storage=f"sqlite:///{study_name}.db", load_if_exists=True, pruner=optuna.pruners.PercentilePruner(60))
    study.optimize(objective, n_trials=100)

I'm trying but it seems to work well for hyperparameter optimization so far.

Anyway, I would like to ask you one additional question. When you get dict of "loss" and "log_vars" by runner.outputs in the line of HERE. Are these losses for the training data? Specifically, is it possible to obtain losses for the val data?

I guess, runner.outputs of HERE returns losses for the training data, maybe not.

AstroYuta commented 3 years ago

Well, I have managed to solve the problem. Thanks a lot, @BIGWangYuDong !!

WassimBouatay commented 2 years ago

This doesn't work on multiple GPU (distributed) for me. How did you do it.

christianhi123 commented 1 year ago

@AstroYuta have you worked with Optuna and the newer version of MMPretrain? I guess Optuna can be integrated by a hook at the "after_test_epoch" location...

Trotts commented 7 months ago

@AstroYuta sorry for reviving an old discussion, but I was wondering if you had a minimal working example for getting Optuna to work with MMDetection? I see above that you managed to solve your problem, though I'm struggling to follow fully how you achieved this based on the code snippets. Any help is greatly appreciated.