Closed lukasfro closed 3 years ago
Hi, that is a good idea.
I am working so that you can either set the true state with env.env.state, or the observation with env.state(observation), Internally it calls env.env.set_state(observation).
I pushed the changes, maybe you can check if it works in your application. Thanks!
@lukasfro, if it works now, could you close it? Thanks!
Unfortunately it is not fully working yet. I just started on a fix and will do a PR later today.
Fixed in PR #5
Hi Sebastian,
thanks for this great framework! Today I found a small inconsistency/bug regarding the state.setter/getter for the GymEnvironment class when using the (Vectorized)PendulumEnv.
When we interact with the (Vectorized)PendulumEnv, we do not directly observe the 'internal' state [theta, dtheta] but rather [cos theta, sin theta, dtheta] via _get_obs(). Consequently, the state attribute in the corresponding observations is also in terms of [cos, sin, dtheta].
Now, I'd assume if we set the state property of the GymEnvironment (https://github.com/sebascuri/rllib/blob/master/rllib/environment/gym_environment.py#L129) that we would also be doing this in terms of the observed state, i.e., value = [cos, sin, theta], since this is what the agent is observing. However, since the (Vectorized)PendulumEnv has a state attribute, we just overwrite the Environments internal state, instead of converting from [cos, sin, dtheta] to [theta, dtheta].
I believe it would be more convenient to the user to always only be exposed to the [cos, sin, dtheta] state instead of having to worry about the conversions. This would presumably require a slightly more involved logic for the GymEnvironment as well as a new method for the (Vectorized)PendulumEnv, e.g., obs_to_state.
Happy to discuss and/or contribute.
Cheers