[Rllib] Tune locks up when attempting to create an rllib algorithm in a trainable

Phirefly9 commented 1 year ago

What happened + What you expected to happen

My team has a tune trainable that creates an rllib algorithm dynamically during step(), however tune is locking up during creation of this algorithm and has required us to not use tune and instead just manually create the tune loop ourselves.

I've attached a script that recreates the issue. you will see the output (TestCartPole pid=243845) Building ALGORITHM but you will never see Training ALGORITHM even though tune thinks it is still running

Versions / Dependencies

ray 2.7 python 3.10 pytorch 2.0.0

Reproduction script

from ray.rllib.algorithms.ppo import PPOConfig
from ray import air
from ray import tune
from ray.tune import Trainable
from typing import Dict, Optional
from ray.rllib.utils.typing import ResultDict

class TestCartPole(Trainable):
    def __init__(self, *args, **kwargs):
        self.config: Dict
        super().__init__(*args, **kwargs)

    def setup(self, config: Dict):
        self.config = config

    def save_checkpoint(self, checkpoint_dir: str):
        return None

    def load_checkpoint(self, checkpoint: Optional[Dict]):
        ...

    def step(self) -> ResultDict:
        """Create and run a single tournament
        """
        config = PPOConfig()
        config = config.training(gamma=0.9, lr=0.01, kl_coeff=0.3)
        config = config.resources(num_gpus=0)
        config = config.rollouts(num_rollout_workers=2)
        print(f"CONFIG: {config}")
        print(config.to_dict())
        # Build a Algorithm object from the config and run 1 training iteration.
        print("Building ALGORITHM")
        algo = config.build(env="CartPole-v1")
        print("Training ALGORITHM")
        result =  algo.train()
        print("Stopping ALGORITHM")
        algo.stop()
        print("Returning result")
        return result

result = tune.Tuner(
    TestCartPole,
    run_config=air.RunConfig(stop={"episode_reward_mean": 200, 'time_total_s': 200}),
    param_space={},
).fit()
print(result)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

ArturNiederfahrenhorst commented 1 year ago

I can reproduce but we don't support creating algorithms within algorithms. That's a very funky pattern. What's the use case here?

Phirefly9 commented 1 year ago

We implemented league play using this pattern in an older version of ray/rllib. But we will adjust if that is not supported.

Thanks

ray-project / ray