Open Steven89Liu opened 4 years ago
It depends if you are running the deterministic or stochastic policy. The stochastic policy will return a different action everytime, but the deterministic policy will always return the same action for the same input. You can switch between the two by using the _enable_training flag in RLAgent. _enable_training = true will use the stochastic policy, and _enable_training = false will use the deterministic policy.
thanks.
but the deterministic policy will always return the same action for the same input.
what i mean is for different input but with a little interval less than the 1.0/30s(you will update the action every 1.0/30s). does it still expect to get the same action?
if in deterministic policy, desired pos should can be determined by the current pos, but why the state contains the info of link velocity? thanks.
sorry, i made a mistake, the state contains the link info, any time we call the _decide_action which will return the expected pos of the joints about 1./30 second later based on the current state.
As i understand it, the _decide_action will return the action which actually is the desired pos, if i call this function two times in less than 1./30, do i will get the same action? thanks.