[Core] [RLlib] RLlib on Ray 2.0 not easily working on Colab

christy commented 2 years ago

What happened + What you expected to happen

Ray does not easily run on Google Colab

Steps to reproduce:

Open Google Colab, do not change the runtime
!pip install ray gputil #gputil is commonly installed and our own warning messages say to install it
import ray ray.init()

What you get: ValueError: invalid literal for int() with base 10: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

Workaround: Do not import gputil if you did not set Colab runtime with GPU!

RLlib does not easily run on Colab

Steps to reproduce

Open Google Colab, do not change the runtime
!pip install ray tensorflow_probability tensorboardX gym==0.21 lz4
import ray from ray.rllib.algorithms.dqn import DQNConfig dqn_config = DQNConfig() dqn_algo = dqn_config.build()

What you get: Never-ending Warning messages about insufficient resources. WARNING insufficient_resources_manager.py:128 -- Ignore this message if the cluster is autoscaling. You asked for 1.0 cpu and 0 gpu per trial, but the cluster only has 2.0 cpu and 0 gpu.

Workaround:

If you are running Ray Tune, make sure that the number of resources is 1 less than available resources. Set: Eval_num_workers: 0 Evaluation_parallel_to_training: False Num_rollout_workers: 1 TOTAL: 1 actor requested
If you are running RLlib .train(), the number of resources can be **equal to** the available resources. Eval_num_workers: 0 Evaluation_parallel_to_training: False Num_rollout_workers: 2 TOTAL: 2 actors requested

Versions / Dependencies

Python 3.7.13 (default Colab) ray: 2.0.0 Number of CPUs in this system: 2. (default Colab runtime) Number of GPUs in this system: 0 (default Colab runtime)

Reproduction script

import ray
from ray.rllib.algorithms.dqn import DQNConfig

# Create a DQNConfig object
dqn_config = DQNConfig()\
   .environment(env="FrozenLake-v1")\
   .framework(framework="torch")\
   .debugging(seed=415, log_level="ERROR")\
   .evaluation(
    evaluation_interval=15, 
    evaluation_duration=5,      
    evaluation_num_workers=0,  #counted as actors
    evaluation_parallel_to_training=False,
    evaluation_config = dict(
        # Explicitly set "explore"=False to override default True
        # Best practice value is False unless environment is stochastic
        explore=False,
    ),)\
   .rollouts(
    num_rollout_workers=2, #counted as actors -> change this to 1 for Colab to work
    num_envs_per_worker=4,)

dqn_algo = dqn_config.build()

# Run Ray Tune
dqn_config.training(
    lr=tune.grid_search([0.00005, 0.0002]),
)
stop_criteria = dict(
        time_total_s=35,
)
experiment_results = \
tune.run(
    dqn_config.algo_class,
    config=dqn_config.to_dict(),
    stop=stop_criteria,
    verbose=2,
    metric="episode_reward_mean",
    mode="max",
)

# Run RLlib .train()
dqn_config = DQNConfig()\
    .environment(env="FrozenLake-v1")\
    .framework(framework="torch")\
    .debugging(seed=415, log_level="ERROR")\
    .evaluation(
        evaluation_interval=15, 
        evaluation_duration=5,      
        evaluation_num_workers=0,  #counted as actors
        evaluation_parallel_to_training=False,
        evaluation_config = dict(
            explore=False,
        ),)\
    .rollouts(
        num_rollout_workers=3,  #counted as actors -> change this to 2 for Colab to work
        num_envs_per_worker=4,)\
    .training(
        lr=0.00005,)

start_time = time.time()
# train the Algorithm instance for 20 iterations
num_iterations = 20
dqn_rewards  = []
checkpoint_dir = "results/DQN/"

for i in range(num_iterations):
    # Call its `train()` method
    result = dqn_algo.train()

    # Extract reward from results.
    dqn_rewards.append(result["episode_reward_mean"])

    # checkpoint and evaluate every 15 iterations
    if ((i % 14 == 0) or (i == num_iterations-1)):
        print(f"Iteration={i}, Mean Reward={result['episode_reward_mean']:.2f}",end="")
        try:
            print(f"+/-{np.std(dqn_rewards ):.2f}")
        except:
            print()
        # save checkpoint file
        checkpoint_file = dqn_algo.save(checkpoint_dir)
        print(f"Checkpoints saved at {checkpoint_file}")
        # evaluate the policy
        eval_result = dqn_algo.evaluate()

# convert num_iterations to num_episodes
num_episodes = len(result["hist_stats"]["episode_lengths"]) * num_iterations
# convert num_iterations to num_timesteps
num_timesteps = sum(result["hist_stats"]["episode_lengths"] * num_iterations)
# calculate number of wins
num_wins = np.sum(result["hist_stats"]["episode_reward"])

# train time
secs = time.time() - start_time
print(f"DQN won {num_wins} times over {num_episodes} episodes ({num_timesteps} timesteps)")
print(f"Approx {num_wins/num_episodes:.2f} wins per episode")
print(f"Training took {secs:.2f} seconds, {secs/60.0:.2f} minutes")

Issue Severity

No response

zhe-thoughts commented 2 years ago

@stephanie-wang @sven1977 Could you help triage?

Also quick question for @christy : is this a regression from previous versions?

pcmoritz commented 2 years ago

On the GPUtil issue (the first one): I can reproduce the error with a fresh colab notebook that doesn't have a GPU. I noticed we are using both GPUtil and gpustat -- the latter seems better maintained, can we only use that? If we use both of them, that puts us at a severe disadvantage since then we inherit the bugs from both packages. cc @richardliaw

christy commented 2 years ago

Something might have gotten fixed between when I first tried Ray 2.0 on Colab. It did not work the first time I tried my code on Colab.

Right now, ray.init() works on Colab default as long as you do not import GPUtil.

We should remove all WARNING messages that tell user to install gputil?

[image: image.png]

On Mon, Sep 12, 2022 at 7:36 PM Philipp Moritz @.***> wrote:

On the GPUtil issue (the first one): I can reproduce this with a fresh colab notebook that doesn't have a GPU. I noticed we are using both GPUtil and gpustat -- the latter seems better maintained, can we only use that? If we use both of them, that puts us at a severe disadvantage since then we inherit the bugs from both packages. cc @richardliaw https://github.com/richardliaw

— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/28457#issuecomment-1244821010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXFRQRSPF5OB3OSC5LH43V57SCLANCNFSM6AAAAAAQK3OGEY . You are receiving this because you were mentioned.Message ID: @.***>

christy commented 2 years ago

@richardliaw - FYI - Philipp made a comment we should replace GPUtil with gpustat

ray-project / ray