mhauskn / dqn-hfo

MIT License
78 stars 23 forks source link

what is the Low Level Actions Dash "power" range?? #38

Closed liyuyuc closed 5 years ago

liyuyuc commented 5 years ago

in the manual of HFO, it writes: Screenshot from 2019-04-18 09-49-50 so the power's range is [-100, 100].

but in your another papaer:"On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning" Screenshot from 2019-04-18 09-52-16

so, what's the power's range?

Very Thanks for your answer.

wbwatkinson commented 5 years ago

Both statements are true. HFO allows power parameter inputs in the range [-100, 100], and the experiments related to the Deep Reinforcement Learning policy updates paper limited the power to [0,100.0], as evinced in the code of this repository. Recommend closing.

Correction: in the referenced paper, based on this repository, it appears that when the agent selected a random action, (based on epsilon), and when that random action was Dash, it would select a dash power of -100 to 100. However, when the critic was updating the actor, the inverting gradients algorithm restricted the dash power to the range 0 to 100. See below for applicable lines of code.

Random Action: https://github.com/mhauskn/dqn-hfo/blob/c7b0a73de07078e248015d44573d8dcadd6fb8d1/src/dqn.cpp#L669-L690

Inverting Gradients: https://github.com/mhauskn/dqn-hfo/blob/c7b0a73de07078e248015d44573d8dcadd6fb8d1/src/dqn.cpp#L945-L954