Open Dr-IceCream opened 7 months ago
It seems I have initially solved this issue by referring to the solutions in #40312 and the comments in the ray.rllib.core.rl_module code. using the following code repalcing the original code:
# Evaluate the model
obs, info = env.reset()
print("obs:", obs)
actions = {}
for agent_id, agent_obs in obs.items():
# Determine the policy ID for the current agent using the policy mapping function
policy_id = f"controlled_vehicle_{agent_id}"
# Compute actions for each agent
rl_module = saved_algorithm.get_module(policy_id)
fwd_ins = {"obs": torch.Tensor([agent_obs])}
fwd_outputs = rl_module.forward_inference(fwd_ins)
action_dist_class = rl_module.get_inference_action_dist_cls()
action_dist = action_dist_class.from_logits(
fwd_outputs["action_dist_inputs"]
)
action = action_dist.sample()[0].numpy()
actions[agent_id] = action
# actions = saved_algorithm.compute_actions(obs)
print("actions: ", actions)
and the output is as follows:
actions: {0: array(4, dtype=int64), 1: array(3, dtype=int64)}
it seems to be working.
but when I used a similar code approach to conduct multiple episode evaluations, the results were significantly worse compared to the training outcomes reported during training: episode_len_mean dropped from about 29 to 7, and episode_reward_mean decreased from about 42 to 10. After recording the videos, it was also evident that the agent indeed had its own strategy π, but the performance was relatively poor. I suspect that the specific steps I used to calculate actions directly through the rl_module might differ from those actually used during training, but I am not clear on the specific steps for action calculation used during training. Therefore, I would like to ask if I might have indeed done something wrong in this part?
and I still wonder why can't directly use the saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs)
or saved_algorithm.compute_single_action(agent_obs, policy_id)
for action computing? or does this mean that this invocation will have a new syntax in the new API stack.
I am facing a similar issue with the SingleAgentEnvRunner - see screeenshot attached.
i dont know why this issue is p2. should be p0
This issue is still persistent. Hey @simonsays1980 @sven1977. Is there any plan to fix this?
I'm unfortunately having this issue too:
ppo_agents = Algorithm.from_checkpoint(checkpoint=checkpoint_path)
actions = ppo_agents.compute_actions(observations=observations)
Results in:
AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'
Applying a fix similar to what @Dr-IceCream did, which follows #40312, gives very degraded results, to the point where it's unusable :(
Since we've had a similar P1 issue, should this be upgraded? @simonsays1980 @sven1977
Same error here. I'm unable to use a trained model.
I am also seeing this error in both PPO and SAC. Is there a recommended workaround or stable commit to rollback to?
I see this error in my singleAgent too. Is it related to mismatch between New API stack and old one or something else? not any solution for this? 🙄
I found a solution for this. I changed the action call method (according to this: https://docs.ray.io/en/master/rllib/rllib-training.html) as below:
from ray.rllib.core.rl_module import RLModule
# Create only the neural network (RLModule) from our checkpoint.
rl_module = RLModule.from_checkpoint(
pathlib.Path(best_checkpoint) / "learner_group" / "learner" / "rl_module"
)["default_policy"
for call action:
while not terminated and not truncated:
env.render()
# Compute the next action from a batch (B=1) of observations.
torch_obs_batch = torch.from_numpy(np.array([obs]))
action_logits = rl_module.forward_inference({"obs": torch_obs_batch})[
"action_dist_inputs"
]
# The default RLModule used here produces action logits (from which
# we'll have to sample an action or use the max-likelihood one).
action = torch.argmax(action_logits[0]).numpy()
obs, reward, terminated, truncated, info = env.step(action)
episode_return += reward
What happened + What you expected to happen
After training Multi Agent PPO with new New API Stack under the guidance of how-to-use-the-new-api-stack I tried to compute actions:
but I get the error message:
I also tried some other way like:
action = saved_algorithm.compute_single_action(agent_obs, policy_id)
but still get the same error message: AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'. I have seen a similar issue in #40312, are these two issues the same?detailed error message are as follows:
and before I call this method, I also printed the relevant info, this part looks normal:
through the code:
Versions / Dependencies
Ray 2.10.0 Python 3.8.18 Windows11
Reproduction script
the code used for training is as follows:
And the code for loading checkpoints:
Issue Severity
High: It blocks me from completing my task.