ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.23k stars 5.81k forks source link

[RLLib] Document how to change Algorithm configuration when restoring a checkpoint #40777

Open kronion opened 1 year ago

kronion commented 1 year ago

Description

I'm trying to restore an RLLib algorithm from a checkpoint and change the configuration before resuming training. My main objective is to change the number of rollout workers between runs, but I may need to adjust other configuration details as well, e.g. env config. I assume this is possible, but I can't find any specific documentation, and the obvious approaches don't seem to work.

For example, this doesn't work:

# If I don't enable eager execution manually, restoring the checkpoint fails
import tensorflow as tf
tf.compat.v1.enable_eager_execution()

from ray import tune
from ray.rllib.algorithms import ppo

...

    ppo_config = (
        ppo.PPOConfig()
            .rl_module(_enable_rl_module_api=False)
            .environment(env=Env, env_config=env_config)
            .framework(framework="tf2", eager_tracing=True)
            .rollouts(**rollout_config)
            .training(**training_config, _enable_learner_api=False)
            .resources(**resources_config)
    )
    tuner = tune.Tuner(
        ppo.PPO,
        param_space=ppo_config,
    )

    restore_path = input()
    if restore_path:
            algo = ppo.PPO.from_checkpoint(restore_path)
            tuner = tune.Tuner(
                algo,
                param_space=ppo_config,
            )

    tuner.fit()

If I restore a checkpoint from a training session with 5 rollout workers, the new session will also have 5 rollout workers, regardless of what I pass in as param_space.

I also considered the Tuner.restore() API, like this:

tuner = tune.Tuner.restore(restore_path, ppo.PPO, resume_errored=True, param_space=ppo_config)

But the docs specifically say that changing the param_space is unsupported: https://docs.ray.io/en/master/tune/api/doc/ray.tune.Tuner.restore.html#ray-tune-tuner-restore

The closest thing I could find was here in the Tune FAQ: https://docs.ray.io/en/latest/tune/faq.html#how-can-i-continue-training-a-completed-tune-experiment-for-longer-and-with-new-configurations-iterative-experimentation

But it's not clear how to apply this to an RLLib Algorithm. It isn't obvious how to extract an AlgorithmConfig from a checkpoint, modify it, and then build a new Algorithm instance.

Assuming there's a pattern for how to modify the config, it would be great to add to the documentation. If this isn't actually possible, I think it would be an important feature to add.

Link

No response

angelinalg commented 1 year ago

Defering to eng to determine final priority. It seems like a P1, to me.

Finebouche commented 1 year ago

This would be really usefull

Xavier0 commented 3 months ago

Useful when I'm trying to recover a checkpoint that has been completed within the previously specified number of iterations and increase the number of iterations required.