stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
877 stars 248 forks source link

observations varies when fetched with two env methods #202

Open yinhaoz opened 4 years ago

yinhaoz commented 4 years ago

Hi there,

I used the following two methods to obtain the observations and get slightly different results: obs_list, reward, done, info = env.step(action, project=True, obs_as_dict=False). Just after this line, use another function to get observation, obs_list2 = env.get_observation(). But obs_list is not equal to obs_list2, with abs error around 1e-4. Not sure what happens between these two lines that changes the observation values.

smsong commented 4 years ago

@bmmi Sorry for the delayed response. The errors in v_tgt_field are due to the observation giving v_tgt_field based on the previous position. Thus, you should not see errors in other fields. Please let us know if this description does not match your observations. We are aware of the lag in v_tgt_field. We thought a good solution should not be sensitive to how v_tgt_field is updated but will consider fixing it (or at least will make .get_observation() to give the correct values) in Round 2.

smsong commented 4 years ago

@kiwi-byte This is not the same issue as #199, which was about the differences in docker and server submissions. #199 is fixed.

smsong commented 4 years ago

@bmmi The v_tgt_field lagging issue is solved in a new commit: 1e0053e2f0da5dea3d2c2721053dcfb0d07cba75 It will be applied for Round 2 (not applied to the server yet).