Closed mrbeann closed 4 years ago
For VPG:
action_range
For PPO: https://github.com/thu-ml/tianshou/blob/5f2c5347df82f2a624a58f941f31eeee3dec9c1a/tianshou/policy/modelfree/ppo.py#L118-L119 Modify these two lines of code.
I haven't test action-clip with multidimensional limitations. If you play with the code and make it work, you can submit a pull-request so that I can review it.
Yeah, I'll do it!
Do not close it until it has been resolved :)
torch.clamp
does not support mutli-dimensional bounds.
Instead, one should use torch.min(torch.max(act, self._range[0]), self._range[1])
.
I come up with an elegant solution without modifying any lines of the current codebase:
If your env's action_low = [0, 1, 10]
and action_high = [1, 10, 100]
, add a env wrapper like this:
class ProjAct(gym.Wrapper):
def __init__(self, env):
super().__init__(env)
self._low = np.array(env.action_space.low)
self._high = np.array(env.action_space.high)
def step(self, act):
# assume act is [-1, 1]
# convert [-1, 1] to [self._low, self._high]
proj_act = act * (self._high - self._low) / 2.0 + (self._low + self._high) / 2.0
return self.env.step(proj_act)
env = ProjAct(MyEnv(...))
At this time, the nn only needs to output [-1, 1]
range of action.
If anyone working on multi-agent policies is facing this issue, it is because you need to specify action_scaling and action_clip when creating mapolicy as well:
MultiAgentPolicyManager(agents, env, action_scaling=True, action_bound_method='clip')
Also, if you are using Gymnasium, you need to change gym
imports to gymnasium
in BasePolicy
:
import gymnasium as gym
from gymnasium.spaces import Box, Discrete, MultiBinary, MultiDiscrete
It seems VPG and PPO doesn't allow the action range to be multidimensional? (VPG seems not support action range.) e.g., action_low = [0, 1, 10], action_high = [1, 10, 100]
Any ideas about how to fix it?