Open vymao opened 1 year ago
in addition to last_info_for
there seems other useful methods disappeared in EpisodeV2
like last_observation_for
and last_raw_obs_for
.
If you modify rllib/examples/custom_metrics_and_callbacks.py
to train on PPO, you end up with exceptions on missing methods. Sample code provided below:
"""Example of using RLlib's debug callbacks.
Here we use callbacks to track the average CartPole pole angle magnitude as a
custom metric.
We then use `keep_per_episode_custom_metrics` to keep the per-episode values
of our custom metrics and do our own summarization of them.
"""
import argparse
import os
from typing import Dict
import gymnasium as gym
import numpy as np
import ray
from ray import air, tune
from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env import BaseEnv
from ray.rllib.evaluation import Episode, RolloutWorker
from ray.rllib.policy import Policy
parser = argparse.ArgumentParser()
parser.add_argument(
"--framework",
choices=["tf", "tf2", "torch"],
default="torch",
help="The DL framework specifier.",
)
parser.add_argument("--stop-iters", type=int, default=2000)
# Create a custom CartPole environment that maintains an estimate of velocity
class CustomCartPole(gym.Env):
def __init__(self, config):
self.env = gym.make("CartPole-v1")
self.observation_space = self.env.observation_space
self.action_space = self.env.action_space
self._pole_angle_vel = 0.0
self.last_angle = 0.0
def reset(self, *, seed=None, options=None):
self._pole_angle_vel = 0.0
obs, info = self.env.reset()
self.last_angle = obs[2]
return obs, info
def step(self, action):
obs, rew, term, trunc, info = self.env.step(action)
angle = obs[2]
self._pole_angle_vel = (
0.5 * (angle - self.last_angle) + 0.5 * self._pole_angle_vel
)
info["pole_angle_vel"] = self._pole_angle_vel
return obs, rew, term, trunc, info
class MyCallbacks(DefaultCallbacks):
def on_episode_step(
self,
*,
worker: RolloutWorker,
base_env: BaseEnv,
policies: Dict[str, Policy],
episode: Episode,
env_index: int,
**kwargs
):
# Make sure this episode is ongoing.
assert episode.length > 0, (
"ERROR: `on_episode_step()` callback should not be called right "
"after env reset!"
)
pole_angle = abs(episode.last_observation_for()[2])
raw_angle = abs(episode.last_raw_obs_for()[2])
assert pole_angle == raw_angle
episode.user_data["pole_angles"].append(pole_angle)
# Sometimes our pole is moving fast. We can look at the latest velocity
# estimate from our environment and log high velocities.
if np.abs(episode.last_info_for()["pole_angle_vel"]) > 0.25:
print("This is a fast pole!")
if __name__ == "__main__":
args = parser.parse_args()
config = (
PPOConfig()
.environment(CustomCartPole)
.framework(args.framework)
.callbacks(MyCallbacks)
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
.rollouts(enable_connectors=True)
.reporting(keep_per_episode_custom_metrics=True)
)
ray.init(local_mode=True)
tuner = tune.Tuner(
"PPO",
run_config=air.RunConfig(
stop={
"training_iteration": args.stop_iters,
},
),
param_space=config,
)
# there is only one trial involved.
result = tuner.fit().get_best_result()
# Verify episode-related custom metrics are there.
custom_metrics = result.metrics["custom_metrics"]
print(custom_metrics)
assert "pole_angle_mean" in custom_metrics
assert "pole_angle_var" in custom_metrics
Is there a workaround for this?
So far I see 2 possible solutions:
Episode
(V1) to EpisodeV2
such as (set_)last_observation_for
and (set_)last_row_obs_for
and update EnvRunnerV2
to call them. This could be done while looping over env_obs
in https://github.com/ray-project/ray/blob/ca0e04994edcbbced2a2b18215b7ebf8d47c7bce/rllib/evaluation/env_runner_v2.py#L533 with something like:
# Collect raw and filtered observations
episode._set_last_raw_obs(agent_id, obs)
filtered_obs = _get_or_raise(self._worker.filters, policy_id)(obs)
episode._set_last_observation(agent_id, filtered_obs)
_agent_collectors
from EpisodeV2
, with something like episode._agent_collectors[agent_id].buffers[SampleBatch.OBS][-1][-1]
to get last processed observation. Not sure how to get raw obs though.@ArturNiederfahrenhorst let me know your thoughts, I'd be happy to work on a PR.
Here is a workaround, for whom it may concern: when using PPO you can force use of Episode
(v1) by disabling new RL Module, connectors and learner API.
Sample config:
config = (
PPOConfig()
.rl_module(_enable_rl_module_api=False)
.training(_enable_learner_api=False)
.rollouts(enable_connectors=False)
.environment(CustomCartPole)
.framework(args.framework)
.callbacks(MyCallbacks)
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
.reporting(keep_per_episode_custom_metrics=True)
)
@antoine-galataud We are moving away from EnvRunnerV2, so such efforts should go into https://sourcegraph.com/github.com/ray-project/ray/-/blob/rllib/env/env_runner.py. Thanks for offering your help - can you hold back for 1-2 weeks? After https://github.com/ray-project/ray/pull/39732 is merged, there should be a clearer picture on master about how such Episodes are built in PPO.
Thereafter, there will likely be an EpisodeV3, where these changes should go.
CC @sven1977 @simonsays1980
Hello! I would like to know if there is a way to read this information currently? I can't read action and observation in the training loop.@ArturNiederfahrenhorst
What happened + What you expected to happen
The Episode class provided the method
last_info_for
to pull the info dict return to the agent at the latest step. Now EpisodeV2 doesn't have such a method, and subsequently returns errors like'EpisodeV2' object has no attribute 'last_info_for'
. This seems to also break the/rllib/examples/custom_metrics_and_callbacks.py
example provided (different error but similar motivation).Is there a recommended workaround for this? Can we opt to use Episode, or is there a version of Ray that we would need to downgrade to to have this?
Versions / Dependencies
Ray 2.5.0
Reproduction script
Using this as a callback in training:
Issue Severity
High: It blocks me from completing my task.