xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.32k stars 489 forks source link

differnent states but in same frame will get the same action? #106

Open Steven89Liu opened 4 years ago

Steven89Liu commented 4 years ago

As i understand it, the _decide_action will return the action which actually is the desired pos, if i call this function two times in less than 1./30, do i will get the same action? thanks.

xbpeng commented 4 years ago

It depends if you are running the deterministic or stochastic policy. The stochastic policy will return a different action everytime, but the deterministic policy will always return the same action for the same input. You can switch between the two by using the _enable_training flag in RLAgent. _enable_training = true will use the stochastic policy, and _enable_training = false will use the deterministic policy.

Steven89Liu commented 4 years ago

thanks.

but the deterministic policy will always return the same action for the same input.

what i mean is for different input but with a little interval less than the 1.0/30s(you will update the action every 1.0/30s). does it still expect to get the same action?

if in deterministic policy, desired pos should can be determined by the current pos, but why the state contains the info of link velocity? thanks.

Steven89Liu commented 4 years ago

sorry, i made a mistake, the state contains the link info, any time we call the _decide_action which will return the expected pos of the joints about 1./30 second later based on the current state.