stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
888 stars 249 forks source link

Question about target velocity #177

Closed keavil closed 6 years ago

keavil commented 6 years ago

After reading codes to generate target velocity here https://github.com/stanfordnmbl/osim-rl/blob/3ceadccc2f9104c9012281a482cfff5203f703bd/osim/env/osim.py#L499 , I got a question:

In description in #164 , you mentioned '(changing the heading of the velocity vector at the rate 20 degrees per second)'. But in the code, the value of poisson_lambda is 300 and it means the velocity changes about every 300 steps (3 seconds). Moreover, the code generate 10 different times to change the velocity and heading, but 7 of them are after 1000. It seems like a bug that poisson_lambda should be 100? Or it is the desired behavior?

Another related question is that currently the bonus for not falling is 10. This value is so high that even just standing there could receive more than 8000 rewards. Is this desired?

Thanks for your reply!

kidzik commented 6 years ago

Yes, it's 3 seconds, we realized that a change every second might be too quick.

Regarding the high reward for not falling, the basic interpretation of the task is: "Don't fall. If you manage to do that, follow the velocity vector" :) All the solutions that don't fall will have exactly the same number of points for not falling, so the winning solution must also optimize for the velocity objective.