ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.49k stars 5.69k forks source link

[RLlib] Bugs in code snippets in the docs of RLlib training #37082

Open DrunkenRandomWalker opened 1 year ago

DrunkenRandomWalker commented 1 year ago

Description

I'm learning rllib, it seems very powerful, but I found the following two errors in the doc.

import ray
from ray import air, tune

ray.init()

config = PPOConfig().training(lr=tune.grid_search([0.01, 0.001, 0.0001]))

tuner = tune.Tuner(
    "PPO",
    run_config=air.RunConfig(
        stop={"episode_reward_mean": 150},
    ),
    param_space=config,
)

tuner.fit()

I got following error: image

# Get a reference to the policy
import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig

algo = (
    PPOConfig()
    .environment("CartPole-v1")
    .framework("torch")
    .rollouts(num_rollout_workers=0)
    .build()
)
# <ray.rllib.algorithms.ppo.PPO object at 0x7fd020186384>

policy = algo.get_policy()
# <ray.rllib.policy.eager_tf_policy.PPOTFPolicy_eager object at 0x7fd020165470>

# Run a forward pass to get model output logits. Note that complex observations
# must be preprocessed as in the above code block.
logits, _ = policy.model({"obs": np.array([[0.1, 0.2, 0.3, 0.4]])})
# (<tf.Tensor: id=1274, shape=(1, 2), dtype=float32, numpy=...>, [])

# Compute action distribution given logits
policy.dist_class
# <class_object 'ray.rllib.models.tf.tf_action_dist.Categorical'>
dist = policy.dist_class(logits, policy.model)
# <ray.rllib.models.tf.tf_action_dist.Categorical object at 0x7fd02301d710>

# Query the distribution for samples, sample logps
dist.sample()
# <tf.Tensor: id=661, shape=(1,), dtype=int64, numpy=..>
dist.logp([1])
# <tf.Tensor: id=1298, shape=(1,), dtype=float32, numpy=...>

# Get the estimated values for the most recent forward pass
policy.model.value_function()
# <tf.Tensor: id=670, shape=(1,), dtype=float32, numpy=...>

policy.model.base_model.summary()

I got the following error: image

Link

https://docs.ray.io/en/latest/rllib/rllib-training.html

DrunkenRandomWalker commented 1 year ago

after made the following change

model_out = model({"obs": torch.tensor([[0.1, 0.2, 0.3, 0.4]])})

Access the base Keras models (all default models have a base)

model.base_model.summary() will raise an exception image

I'd like to help in this, can someone tell me where to get started with?

avnishn commented 1 year ago

Thanks for pointing these out @DrunkRandomWalker.

We're actually in the process of overhauling our whole documentation around the new Learner and RLModule APIs, so these examples are effectively outdated.

We're going to put some of the new docs up next week, and I'll ping here to help you find a doc page that you can go about writing / proofing / however you might want to get involved. Also feel free to introduce yourself to us on the ray slack in the RLlib channel, so that can potentially discuss more. Thanks!