Open moganli opened 2 weeks ago
@moganli : I see that PR 46146 reshaped the action masking example. Does your issue persist also in the fresh nighly built ?
@moganli Thanks for raising this. As @PhilippWillms correctly mentioned, we have overhauled the action masking for our new API stack here: https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/action_masking_rlm.py
Could you try out, if this works for you. The old stack will be deprecated in between the next year afaics.
To your question in regard to DreamerV3 and action masking: it does not out of the box. You would need to write a RLModule
wrapper class similar to the one used in the example that can perform the action masking on top of the actor network.
What happened + What you expected to happen
/ray/rllib/examples/action_masking.py modify: replace action_masking.py line 97 "ppo.PPOConfig()" with" dreamerv3.DreamerV3Config()" bug: ValueError: Cannot specify a gym.Env class via
config.env
while settingconfig.remote_worker_env=True
AND your gym version is >= 0.22! Try installing an older version of gym or setconfig.remote_worker_env=False
.Process finished with exit code 1
Does DreamerV3 currently support actionmasking?
Versions / Dependencies
py==3.11 ray==2.9 ubunto 22.04
Reproduction script
"""Example showing how to use "action masking" in RLlib.
"Action masking" allows the agent to select actions based on the current observation. This is useful in many practical scenarios, where different actions are available in different time steps. Blog post explaining action masking: https://boring-guy.sh/posts/masking-rl/
RLlib supports action masking, i.e., disallowing these actions based on the observation, by slightly adjusting the environment and the model as shown in this example.
Here, the ActionMaskEnv wraps an underlying environment (here, RandomEnv), defining only a subset of all actions as valid based on the environment's observations. If an invalid action is selected, the environment raises an error
The environment constructs Dict observations, where obs["observations"] holds the original observations and obs["action_mask"] holds the valid actions. To avoid selection invalid actions, the ActionMaskModel is used. This model takes the original observations, computes the logits of the corresponding actions and then sets the logits of all invalid actions to zero, thus disabling them. This only works with discrete actions.
Run this example with defaults (using Tune and action masking):
$ python action_masking.py
Then run again without action masking, which will likely lead to errors due to invalid actions being selected (ValueError "Invalid action sent to env!"):
$ python action_masking.py --no-masking
Other options for running this example:
$ python action_masking.py --help """
import argparse import os
from gymnasium.spaces import Box, Discrete import ray from ray.rllib.algorithms import ppo from ray.rllib.examples.env.action_mask_env import ActionMaskEnv from ray.rllib.examples.rl_module.action_masking_rlm import ( TorchActionMaskRLM, TFActionMaskRLM, ) from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
from ray.tune.logger import pretty_print
def get_cli_args(): """Create CLI parser and return parsed arguments""" parser = argparse.ArgumentParser()
if name == "main": args = get_cli_args() from ray.rllib.algorithms import dreamerv3
Issue Severity
High: It blocks me from completing my task.