Closed mattiasljungstrom closed 6 years ago
That's what we intended for the first round as defined here http://osim-rl.stanford.edu/docs/nips2018/evaluation/ but indeed it brings some confusion, we will reconsider it (and maybe include in the next release since it's still early enough for such change). Thanks for bringing it up.
We will keep it as is for the first round, but it won't be an issue in the second round. Nevertheless, thanks for bringing it up!
Because the reward function doesn't consider movement in Z, agents can stumble sideways as they walk forward. Would it perhaps be more appropriate to consider the velocity in (x,z) == (3,0)? Or use some kind of other penalty for wandering off in Z axis.
I realize this is a comment not a bug, but wanted to highlight this issue.