Agent movement in Z is not considered in reward function

stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models

http://osim-rl.stanford.edu/

MIT License

882 stars 249 forks source link

Agent movement in Z is not considered in reward function #137

Closed mattiasljungstrom closed 6 years ago

mattiasljungstrom commented 6 years ago

Because the reward function doesn't consider movement in Z, agents can stumble sideways as they walk forward. Would it perhaps be more appropriate to consider the velocity in (x,z) == (3,0)? Or use some kind of other penalty for wandering off in Z axis.

I realize this is a comment not a bug, but wanted to highlight this issue.

kidzik commented 6 years ago

That's what we intended for the first round as defined here http://osim-rl.stanford.edu/docs/nips2018/evaluation/ but indeed it brings some confusion, we will reconsider it (and maybe include in the next release since it's still early enough for such change). Thanks for bringing it up.

kidzik commented 6 years ago

We will keep it as is for the first round, but it won't be an issue in the second round. Nevertheless, thanks for bringing it up!