[RLlib] [Feature] Support for having parametric action spaces/action masking for continuous action space models

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.13k stars 5.61k forks source link

[RLlib] [Feature] Support for having parametric action spaces/action masking for continuous action space models #22259

Open bkaplowitz opened 2 years ago

bkaplowitz commented 2 years ago

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

Would it be possible to have parametric action spaces/action masking for continuous action space models? Perhaps there is a way of doing this already with discrete actions, but I don't immediately see how.

Use case

Have a state that acts as a cap on the choice and a lower bound at 0. Both constraints aren't being respected, and clipping leads to significant loss of sampling and mass points at the boundaries, which messes up the computation of the induced action distribution.

Related issues

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

avnishn commented 2 years ago

I'm not sure if understand the feature that you're asking for.

Could you use a mix of python/pseudocode/markdown to give an example of what you're asking for?

bkaplowitz commented 2 years ago

Sure.

So I have an environment whose observation space is a spaces.box(observations_low, observations_high). One variable I compute based on an entry of the observation received this period is effectively self.action_upper_bound (either initial value if t=0 or as output of step function if t>0.)

I would like the sampler to take the self.action_upper_bound for this period, treat it as the upper bound of the action space, and compute the value-maximizing action only from spaces.box( 0, self.action_upper_bound).

Is this possible?

gjoliver commented 2 years ago

as a bandaid solution, can you just write a wrapper for your env and cap your action there?

SimonHashtag commented 2 years ago

I am also looking for such a feature.

I found one working for discrete spaces (https://github.com/ray-project/ray/blob/master/rllib/examples/models/action_mask_model.py), however not for continuous spaces. Just like @bkaplowitz I have to cap an action depending on the current state/obs (e.g. wealth -> agent is only allowed to spend his wealth and not more; wealth changes each period).

Discretizing the space or using a wrapper are some workarounds (thx for the tip @gjoliver), however having an option to clip the valid action space inside the model using action masking is most likely more efficient (and also prettier).

I have already implemented this from scratch (DDPG) and it works, but now I want to migrate my model to rllib to try out some other algorithms like PPO...

(I am happy to provide code from my env or from the actor network, if that helps...)