stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
896 stars 249 forks source link

Previous trajectory is executed when restoring a saved environment #131

Closed janEbert closed 6 years ago

janEbert commented 6 years ago

When restoring an environment's state, on a new step, the previous trajectory is also executed so that in the end, it is not possible to start from the same state.

Setup:

from osim.env import ProstheticsEnv
env = ProstheticsEnv(visualize=True, integrator_accuracy=1e-1)  # we quickly want to see what happens
env.reset()

We then save the state at an arbitrary point in time (here at t = 0):

state_checkpoint = env.osim_model.get_state()  # store state
for i in range(50):
    env.step(env.action_space.high)  # execute step with static action

After restoring and executing another step, we get the previous x (in our case 50) steps as well:

env.osim_model.set_state(state_checkpoint)  # restore state
env.step(env.action_space.high)

I also tried shallow-copying the state (copy.deepcopy complains about unpickleable SWIG objects), however, this did not change anything. Setting the state's Y value (the state's internal representation, as fas as I understood) using env.osim_model.state.setY(y_checkpoint) with or without previously setting the state also did not change the outcome. This might be related to SimTK::State's Python interface being slightly buggy but could be unrelated as well.

I am on Windows 10 using the latest OpenSim version (Python 3.6.1) and followed the recommended installation instructions.

Related links: #79, #125, 7ecae69c3cc8021455e3f9a3e207a2689e743929.

kidzik commented 6 years ago

Turns out that the step count was not updated on set_state. In the example above, for the new run from the given state, it was starting from 0 and integrating all the way to 51 (instead of 1).
Step count should probably be removed completely (we can read time from the simulation), but the current solution should also be ok for now (update of the step count on set_state).