stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
894 stars 249 forks source link

get_observation returns absolute and relative positions #155

Closed AdamStelmaszczyk closed 6 years ago

AdamStelmaszczyk commented 6 years ago

I think the intention was to return positions relative to pelvis, this is a good intention. It was correctly implemented for center of mass (cm_pos): https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/osim.py#L472

However, not for the body parts, https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/osim.py#L457:

cur_upd = cur
cur_upd[:2] = [cur[i] - pelvis[i] for i in range(2)]
cur_upd[6:7] = [cur[i] - pelvis[i] for i in range(6,7)]
res += cur

The last line was probably meant to be res += cur_upd. Otherwise, cur_upd is created and not used at all.

kidzik commented 6 years ago

Thanks for catching this! We will remove the vectorized observation in the next release and leave only the dictionary observation to avoid confusions. (However, we will provide some example functions, so it's good to know these issues)

AdamStelmaszczyk commented 6 years ago

I would like to receive the vectorized observation as this is what I can feed directly to APIs.

Without such option, everybody has to have a code vectorizing dicts.

The fix is easy, it's just changing cur to cur_upd in that last line.

AdamStelmaszczyk commented 6 years ago

Now I realized that this is not the only problem in get_observation.

Another problem is having no z dimension, reported here https://github.com/stanfordnmbl/osim-rl/issues/129#issuecomment-401634297.

It will be nice to have a solid implementation of this vectorization, because everybody needs that.

kidzik commented 6 years ago

Simple vectorization of all the values is here https://github.com/stanfordnmbl/osim-rl/blob/38cddaa135771ef341188055d20a4e776a7002ba/tests/test.flatten.py

zhengfeiwang commented 6 years ago

@AdamStelmaszczyk In fact, the cur_upd is the reference of cur, which makes them kept same value. You can try with a = [1, 2, 3], b = a, b[0] = a[0] - a[2] and then print a and b, you will find they are same, instead.

AdamStelmaszczyk commented 6 years ago

@wangzhengfei0730 Oh, you're right, thanks.

Somehow I interpreted cur_upd = cur as: cur_upd is now a deep copy of cur, but yeah it's a list reference copy.

So, the code looks correct, no issue.