Sampling from the feasible space

stratisMarkou commented 5 years ago

Do the classical control environments support sampling at random from the whole space of allowed states? For example in Pendulum, env.reset() resets the state to a random angle between -pi and +pi and the velocity from -1 to +1, but I would like that to be -pi and +pi and -8 and +8 to match the allowed states of the environment.

I've tried a naive manual setting of env.state:

state = env.reset()
print('state after reset:', env.state)
action = np.array([0.])
_, _, _, _ = env.step(action)
print('state after reset and step:', env.state)
env.state = np.array([0.5, -1.])
print('state after assignment:', env.state)
_, _, _, _ = env.step(action)
 print('state after assignment and step:', env.state)

but after manually resetting the state, it stops evolving:

state after reset: [ 2.36025963 -0.76572957]
state after reset and step: [ 2.34838165 -0.23755973]
state after assignment: [ 0.5 -1. ]
state after assignment and step: [ 0.5 -1. ]

Any ideas why this is happening or workarounds? Help would be much appreciated :)

AkshayS96 commented 5 years ago

This is because when you make the environment using gym.make("Pendulum-v0") it returns a TimeStamp class not the actual PendulumEnv class. You can do that by using env.env.state as in the below code

print('state after reset:', env.state)
action = np.array([0.])
_, _, _, _ = env.step(action)
print('state after reset and step:', env.state)
env.env.state = np.array([0.5, -1.])
print('state after assignment:', env.state)
_, _, _, _ = env.step(action)
print('state after assignment and step:', env.state)

Output is

state after reset: [0.70864103 0.82823708]
state after reset and step: [0.77445798 1.31633901]
state after assignment: [ 0.5 -1. ]
state after assignment and step: [ 0.46797846 -0.64043085]

stratisMarkou commented 5 years ago

Great, works as desired, thanks!

openai / gym

Sampling from the feasible space #1502