ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.85k stars 5.57k forks source link

[rllib] Action masking with a Tuple action space #9404

Closed maxpumperla closed 1 year ago

maxpumperla commented 4 years ago

What is the problem?

On ray 0.9.0.dev0 and 0.8.5 I have an example that uses a simple Tuple action space, similar to what you have in the auto-regression example here:

https://github.com/ray-project/ray/blob/5f278c6411eda9d0d5e69e5e14e3496a17daf298/rllib/examples/env/correlated_actions_env.py#L13

but instead of auto-regression, we use action masking (following your template on that). When I run the example I get the following error message:

  File "/Users/maxpumperla/code/pathmind/factory-puzzle/venv/lib/python3.7/site-packages/ray/rllib/agents/dqn/distributional_q_tf_model.py", line 105, in build_action_value
    units=self.action_space.n * num_atoms,
AttributeError: 'Tuple' object has no attribute 'n'

which is correct, but also stops me from proceeding. I hope this is yet another "you're holding it wrong" scenarios, but I don't see what the intended way of doing this is. I realize this is not a fully reproducible example, but I believe the issue is obvious enough. If you need more, let me know.

Fwiw, sure, I could patch DistributionalQTFModel to apply band-aid, but I'd rather see what else we can do here first. We already have quite a few such patches in other places, and I'd rather not maintain my own fork of rllib.

Any help or suggestion how to approach this best is appreciated. Thanks!

rkooo567 commented 4 years ago

I think the issue is DQN doesn't support Tuple spaces. We should probably raise a better error message.

sven1977 commented 4 years ago

Yes, DQN only works on (single) Discrete action spaces. We should add a space error to DQN (thought we did check that though). Any reason why you wouldn't want to use a Discrete(4), instead?

maxpumperla commented 4 years ago

@sven1977 back from vacation. we need two discrete actions of action size two, hence the tuple. how would you emit several actions effectively in a single Discrete space? In any case I think we want to use tuples for the conceptual clarity, since this is exactly the use case for them.

maxpumperla commented 4 years ago

@rkooo567 @sven1977 no, we're not using DQN, but PPO here... (which goes to show some possible improvements in naming conventions, DistributionalQTFModel sure suggests otherwise).

To be clear, we have a running PPO model with tuples, but once I enable action masking it fails with the above error, which isn't very informative. So this is an error that occurs when working with both tuples and action masking together. It fails with DQN too :D

I wonder if action masking could be made a high-level supported feature in the library that works for all algorithms. It'd certainly be a great addition, especially if we manage to get a more robust error handling going.

Rohan138 commented 1 year ago

Closing this issue: For anyone's future reference, while the base RLlib DQN models do not support Tuple spaces correctly, there's a simple fix here: https://github.com/PathmindAI/nativerl/blob/dev/nativerl/python/pathmind_training/models.py Note: the original distributional_q_tf_model.py is being deprecated and moved to the rllib-contrib directory; for new users, please use the new RLlib stack with RLModule/RLTrainer for DQN.