Double tf.log()'s when calculating CategoricalPd.sample() ?

openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

MIT License

15.62k stars 4.86k forks source link

Closed keven425 closed 6 years ago

keven425 commented 6 years ago

On this line above, it takes log twice on u. Why is that the case? I thought it should only take log once, like this:

return tf.argmax(self.logits -tf.log(u), axis=-1)

@joschu

unixpickle commented 6 years ago

-log(-log(uniform)) is the Gumbel distribution. This use a Gumbel-Softmax to sample from a uniform distribution.