question about sampling

inoryy / reaver

Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.

MIT License

554 stars 89 forks source link

Closed Robotuks closed 6 years ago

Robotuks commented 6 years ago

Hey,

I wanted to ask about calculation in sample function.

return tf.argmax(tf.log(u) / probs, axis=1)

it divides from probs. Does that mean that lower probabilities have better chances to get picked? Better exploration???

Robotuks commented 6 years ago

log(u) makes it negative. So everything is fine. But probs can be equal to 0, no?

inoryy commented 6 years ago

@Robotuks I think I ensure probs are never actually 0 somewhere (it's kind of a hack, but hey it works)