ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.75k stars 5.55k forks source link

[Rllib] Lack validation for "num_workers" parameter in DDPGTrainer. #24536

Open fantow opened 2 years ago

fantow commented 2 years ago

What happened + What you expected to happen

When the user needs to execute the DDPG algorithm,the current DDPGTrainer can only support the single-machine version of the algorithm. If user needs to use the multi-machine version, call ApexDDPGTrainer. However, DDPGTrainer allows users to set config["num_workers"]>1, and there is no parameter verification mechanism during the calculation process.This problem has been bothering me for a long time before I found it. rllib_1

When constructing DDPGTrainer,IMHO,we can check the validity of the parameters and provide more obvious hints. If this is a necessory issue,may I submit a PR to fix it?

Versions / Dependencies

Ray 1.12.0 or higher versions Python 3.7

Reproduction script

def test_ddpg_compilation(self):
    """Test whether a DDPGTrainer can be built with both frameworks."""
    config = ddpg.DEFAULT_CONFIG.copy()
    config["seed"] = 42
    config["num_workers"] = 2
    config["num_envs_per_worker"] = 2
    config["learning_starts"] = 0
    config["exploration_config"]["random_timesteps"] = 100

    num_iterations = 2

    # Test against all frameworks.
    for _ in framework_iterator(config):
        trainer = ddpg.DDPGTrainer(config=config, env="Pendulum-v1")
        for i in range(num_iterations):
            results = trainer.train()
            check_train_results(results)
            print(results)
        check_compute_single_action(trainer)
        # Ensure apply_gradient_fn is being called and updating global_step
        if config["framework"] == "tf":
            a = trainer.get_policy().global_step.eval(
                trainer.get_policy().get_session())
        else:
            a = trainer.get_policy().global_step

Issue Severity

High: It blocks me from completing my task.

fantow commented 2 years ago

@maxpumperla Hi,may I invite you to evaluate the necessity of this issue?If it is not a good change,I will close this issue,thank you.