Open man2machine opened 10 months ago
Hey @man2machine thanks for raising this issue. This is really interesting. I actually did not know, you could set a start
arg in a Discrete space :)
As a simple solution, could you make the extra 1-shift inside your env's step()
code?
Like so:
def step(self, action_dict):
action_dict = OrderedDict({k: a+1 for k, a in action_dict.items()})
... # continue with this shifted dict
I'm trying to PR a better solution in the meantime. I tried gym.ActionWrapper around your env, but RLlib's env checker and also the multi-agent env does not allow this b/c a gym.ActionWrapper is NOT an RLlib BaseEnv or an RLlib MultiAgentEnv, so more issues will surface.
@sven1977 Apologies for getting back to you so late. I ended up doing something similar to what you said where I handled the start within the environment step code. Either way, this is something useful that rllib should support since it is part of the gym spaces library and other people may run into the same issue.
What happened + What you expected to happen
When a discrete action space is used with a non-zero start value, the actions generated by the Ray policy algorithm does not respect this, and as a result the actions given are outside of the space. I was able to reproduce the error consistently in the script below. I found this while trying to use RLlib multi-agent, however, the problem could exist for single agent as well (I did not test for that).
Versions / Dependencies
OS: Linux Python: 3.11 Ray: 2.6.3
Reproduction script
Issue Severity
High: It blocks me from completing my task.