Closed keavil closed 6 years ago
Yes, it's 3 seconds, we realized that a change every second might be too quick.
Regarding the high reward for not falling, the basic interpretation of the task is: "Don't fall. If you manage to do that, follow the velocity vector" :) All the solutions that don't fall will have exactly the same number of points for not falling, so the winning solution must also optimize for the velocity objective.
After reading codes to generate target velocity here https://github.com/stanfordnmbl/osim-rl/blob/3ceadccc2f9104c9012281a482cfff5203f703bd/osim/env/osim.py#L499 , I got a question:
In description in #164 , you mentioned '(changing the heading of the velocity vector at the rate 20 degrees per second)'. But in the code, the value of
poisson_lambda
is 300 and it means the velocity changes about every 300 steps (3 seconds). Moreover, the code generate 10 different times to change the velocity and heading, but 7 of them are after 1000. It seems like a bug thatpoisson_lambda
should be 100? Or it is the desired behavior?Another related question is that currently the bonus for not falling is 10. This value is so high that even just standing there could receive more than 8000 rewards. Is this desired?
Thanks for your reply!