state.setter in GymEnvironment is misleading for the (Vectorized)PendulumEnv

lukasfro commented 3 years ago

Hi Sebastian,

thanks for this great framework! Today I found a small inconsistency/bug regarding the state.setter/getter for the GymEnvironment class when using the (Vectorized)PendulumEnv.

When we interact with the (Vectorized)PendulumEnv, we do not directly observe the 'internal' state [theta, dtheta] but rather [cos theta, sin theta, dtheta] via _get_obs(). Consequently, the state attribute in the corresponding observations is also in terms of [cos, sin, dtheta].
Now, I'd assume if we set the state property of the GymEnvironment (https://github.com/sebascuri/rllib/blob/master/rllib/environment/gym_environment.py#L129) that we would also be doing this in terms of the observed state, i.e., value = [cos, sin, theta], since this is what the agent is observing. However, since the (Vectorized)PendulumEnv has a state attribute, we just overwrite the Environments internal state, instead of converting from [cos, sin, dtheta] to [theta, dtheta].

I believe it would be more convenient to the user to always only be exposed to the [cos, sin, dtheta] state instead of having to worry about the conversions. This would presumably require a slightly more involved logic for the GymEnvironment as well as a new method for the (Vectorized)PendulumEnv, e.g., obs_to_state.

Happy to discuss and/or contribute.

Cheers

sebascuri commented 3 years ago

Hi, that is a good idea.

I am working so that you can either set the true state with env.env.state, or the observation with env.state(observation), Internally it calls env.env.set_state(observation).

sebascuri commented 3 years ago

I pushed the changes, maybe you can check if it works in your application. Thanks!

sebascuri commented 3 years ago

@lukasfro, if it works now, could you close it? Thanks!

lukasfro commented 3 years ago

Unfortunately it is not fully working yet. I just started on a fix and will do a PR later today.

lukasfro commented 3 years ago

Fixed in PR #5

sebascuri / rllib

state.setter in GymEnvironment is misleading for the (Vectorized)PendulumEnv #3