xiaowei-hu / pysc2-agents

This is a simple implementation of DeepMind's PySC2 RL agents.
https://zhuanlan.zhihu.com/p/29246185?group_id=890682069733232640
271 stars 77 forks source link

Select action from policy network #2

Closed a3626a closed 7 years ago

a3626a commented 7 years ago

at a3c_agent.step, action is chosen by act_id = valid_actions[np.argmax(non_spatial_action[valid_actions])]

However I think they should be chosen randomly by their probability because non_spatial_action and spatial_action value refers policy value. (check this post https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2)

By the way, it's still not clear when to mask invalid actions. (before soft-max? after soft-max?)

xiaowei-hu commented 7 years ago

Hi, @a3626a 1) Both action sampling methods are ok, see https://arxiv.org/pdf/1709.02878.pdf. I don't try randomly sampling method, and maybe it don't work in this project. 2) Use non_spatial_action[valid_actions] to mask invalid actions. I think it's a python problem and you can test it on IPython.