Closed liyuyuc closed 5 years ago
Both statements are true. HFO allows power parameter inputs in the range [-100, 100], and the experiments related to the Deep Reinforcement Learning policy updates paper limited the power to [0,100.0], as evinced in the code of this repository. Recommend closing.
Correction: in the referenced paper, based on this repository, it appears that when the agent selected a random action, (based on epsilon), and when that random action was Dash, it would select a dash power of -100 to 100. However, when the critic was updating the actor, the inverting gradients algorithm restricted the dash power to the range 0 to 100. See below for applicable lines of code.
Random Action: https://github.com/mhauskn/dqn-hfo/blob/c7b0a73de07078e248015d44573d8dcadd6fb8d1/src/dqn.cpp#L669-L690
Inverting Gradients: https://github.com/mhauskn/dqn-hfo/blob/c7b0a73de07078e248015d44573d8dcadd6fb8d1/src/dqn.cpp#L945-L954
in the manual of HFO, it writes: so the power's range is [-100, 100].
but in your another papaer:"On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning"
so, what's the power's range?
Very Thanks for your answer.