Select action from policy network

xiaowei-hu / pysc2-agents

This is a simple implementation of DeepMind's PySC2 RL agents.

271 stars 77 forks source link

at a3c_agent.step, action is chosen by act_id = valid_actions[np.argmax(non_spatial_action[valid_actions])]

However I think they should be chosen randomly by their probability because non_spatial_action and spatial_action value refers policy value. (check this post https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2)

By the way, it's still not clear when to mask invalid actions. (before soft-max? after soft-max?)

xiaowei-hu / pysc2-agents