Clip with Multi-dimensional action?

thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.

https://tianshou.org

MIT License

7.94k stars 1.12k forks source link

Clip with Multi-dimensional action? #70

Closed mrbeann closed 4 years ago

mrbeann commented 4 years ago

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website, and in particular read the known issues
[x] I have searched through the issue tracker and issue categories for duplicates
[ ] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, torch, sys
print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
```
It seems VPG and PPO doesn't allow the action range to be multidimensional? (VPG seems not support action range.) e.g., action_low = [0, 1, 10], action_high = [1, 10, 100]

Any ideas about how to fix it?

Trinkle23897 commented 4 years ago

For VPG:

add an argument action_range
apply the clip function after this line of code https://github.com/thu-ml/tianshou/blob/5f2c5347df82f2a624a58f941f31eeee3dec9c1a/tianshou/policy/modelfree/pg.py#L78

For PPO: https://github.com/thu-ml/tianshou/blob/5f2c5347df82f2a624a58f941f31eeee3dec9c1a/tianshou/policy/modelfree/ppo.py#L118-L119 Modify these two lines of code.

I haven't test action-clip with multidimensional limitations. If you play with the code and make it work, you can submit a pull-request so that I can review it.

mrbeann commented 4 years ago

Yeah, I'll do it!

Trinkle23897 commented 4 years ago

Do not close it until it has been resolved :)

duburcqa commented 4 years ago

torch.clamp does not support mutli-dimensional bounds. Instead, one should use torch.min(torch.max(act, self._range[0]), self._range[1]).

Trinkle23897 commented 4 years ago

I come up with an elegant solution without modifying any lines of the current codebase:

If your env's action_low = [0, 1, 10] and action_high = [1, 10, 100], add a env wrapper like this:

class ProjAct(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)
        self._low = np.array(env.action_space.low)
        self._high = np.array(env.action_space.high)

    def step(self, act):
        # assume act is [-1, 1]
        # convert [-1, 1] to [self._low, self._high]
        proj_act = act * (self._high - self._low) / 2.0 + (self._low + self._high) / 2.0
        return self.env.step(proj_act)

env = ProjAct(MyEnv(...))

At this time, the nn only needs to output [-1, 1] range of action.

uinversion commented 1 year ago

If anyone working on multi-agent policies is facing this issue, it is because you need to specify action_scaling and action_clip when creating mapolicy as well:

MultiAgentPolicyManager(agents, env, action_scaling=True, action_bound_method='clip')

Also, if you are using Gymnasium, you need to change gym imports to gymnasium in BasePolicy:

import gymnasium as gym
from gymnasium.spaces import Box, Discrete, MultiBinary, MultiDiscrete