thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.79k stars 1.12k forks source link

[Feature Request] Integrated hyperparameter tuning system #439

Closed jkterry1 closed 7 months ago

jkterry1 commented 3 years ago

Stable Baselines 3 has natively integrated hyperparameter tuning via https://github.com/DLR-RM/rl-baselines3-zoo. Generally in reinforcement learning research, trying hyperparameter tuning is almost required, and right now the lack of this integration is the only reason I'm not actively using Tianshou for my research. Would you be interested in adding support for this? Presumably the easiest way would be to fork rl-baselines3 and replace the internals with Tianshou as the Optuna hyperparameter tuning logic itself there is very well done and is general to any library.

Trinkle23897 commented 3 years ago

Thanks for your suggestion. I'll take a look when I'm free, but generally it can be integrated with the training script instead of core.

Trinkle23897 commented 3 years ago

Here's my proposal:

  1. add a test hook after each test_episode to report the test reward for optuna pruning system;
  2. create optuna script outside the existing training script, for example,
import optuna
from atari_ppo import get_args, test_ppo

def objective(trial):
    args = get_args()
    args.epoch = 5
    args.step_per_collect = trial.suggest_int("step_per_collect", 512, 4096, log=True)
    args.repeat_per_collect = trial.suggest_int("repeat_per_collect", 1, 6)
    args.batch_size = trial.suggest_int("batch_size", 128, 1024, log=True)
    args.training_num = 64
    return -test_ppo(args)

if __name__ == "__main__":
    study = optuna.create_study()
    study.optimize(objective, n_trials=100)
jkterry1 commented 3 years ago
jkterry1 commented 3 years ago

Here's an example simplification of using SB3 with Optuna: https://github.com/araffin/rl-handson-rlvs21/blob/main/optuna/sb3_simple.py

hocaso commented 1 year ago

I'm also interested on adding more compatibility with Optuna, but the hook part is what I'm not sure how to do it. Could somebody give me some tips? I would opt to hook the test_result value in the test_step function of the BaseTrainer class tianshou/trainer/base.py.

Also, I could do it if I simplify the training to iterations of each epoch when training, but the iterator in the run function is messing me up. Not sure how to proceed apart from doing a custom trainer logic.

Also, it seems that somebody already trained using Optuna (by the example in examples/box2d/lunarlander_dqn.py). Is that code available somewhere?

Trinkle23897 commented 1 year ago

I think you can create an extra init argument for BaseTrainer to route them together. That's totally fine. That optuna doesn't hack trainer part, it only changes the entrypoint (which is inefficient of course)

Matteo-Bassani commented 1 year ago

Did someone find a working solution to integrate Optuna with Tianshou?

hocaso commented 1 year ago

@Trinkle23897 I've come across a relatively straightforward approach to achieve this without the need to make changes to the library's source code. However, this method might not be the most optimal solution, as it provides information only at the end of each training epoch, rather than at every training iteration. The basic idea behind this approach is to transform the training process into an iterator and incorporate this iterator into Optuna's optimization framework.

Here's a more detailed explanation for those interested: You can work with your existing algorithm files with just a few minor adjustments. Take, for example, test/discrete/test_dqn.py. Here, you can modify the 'OffpolicyTrainer' section as follows:

result = OffpolicyTrainer(<usual_args>)

for epoch, epoch_stat, info in result:
    yield epoch,epoch_stat,info

This change allows you to segment the training process into individual epochs when using for epoch, epoch_stat, info in test_dqn(args). The next step involves creating a separate file for the optimization process. As @jkterry1 said, you can use sb3_simple.py as a starting point. Remove any components related to stable_baselines3 (e.g., TrialEvalCallback) and adapt the objective function:

import test_dqn as tdqn
...
def sample_dqn_params(trial,args):
    ... trial.suggest ...
def objective(trial):
    args = tdqn.get_args()
    args = sample_dqn_params(trial, args)
    i = 0
    for epoch, epoch_stat, info in tdqn.test_dqn(args):
        i += 1
        loss = epoch_stat['test_reward']  # or other thing you want to optimize
        trial.report(loss,i)
        if trial.should_prune():
            raise optuna.TrialPruned
        print(epoch_stat['test_reward'], epoch_stat['test_reward_std']
    return loss
...

Keep in mind, if your algorithm trains in a couple of epochs, this patch might not be the way to go. I haven't tried it with model-based and imitation algorithms, but I believe it should work. I'm curious about the specifics of fully integrating Tianshou with Optuna, but my tweak has worked for my needs so far.

MischaPanch commented 11 months ago

I think it's mainly a documentation issue and should be part of example scripts, but not of the core code. For example, using nni to do HPO in arbitrary scripts is super easy and non-invasive

MischaPanch commented 7 months ago

Duplicate, we work on extensive HPO support in a separate issue #978