inoryy / tensorflow2-deep-reinforcement-learning

Code accompanying the blog post "Deep Reinforcement Learning with TensorFlow 2.1"
http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/
MIT License
207 stars 50 forks source link

Entropy loss question logits #3

Open ahandresf opened 5 years ago

ahandresf commented 5 years ago

Hello,

Sorry if maybe this is a silly question but I am not quite sure what is happening in this part of the code. 1) According to my understanding there the entropy term is added into the loss function in such a way that it regularizes so the agent sometimes explore new actions. 2) Looking into the Keras documentation, I find that keras.losses.categorical_crossentropy(y_true, y_pred) therefore I am not clear what happened when you send the same thing in both arguments. Now the logits are defined as: self.logits = kl.Dense(num_actions, name='policy_logits') So, infer logist is equal in size to number of actions and I imagine that is like a probability per every action that somehow you need to estimate the entropy of that group of actions.

# entropy loss can be calculated via CE over itself
entropy_loss = kls.categorical_crossentropy(logits, logits, from_logits=True)

For the game, it works perfect but I want to modify this code and used in an environment I developed by myself. Hope you can give me some lights in this.

Thanks.

inoryy commented 5 years ago

Hello,

Logits are simply unnormalized log probabilities. There's nothing special about them mathematically, but in practice many TensorFlow ops will work slightly faster and will be more stable if you use logits instead of converting them to probabilities (e.g. via softmax).

If we use CE loss signature you've found then for a single sample we can calculate CE as y_true * log(y_pred). Since we also pass from_logits=True parameter then Keras will know that applying the log is not necessary.

Finally, there is the first logits parameter, which is equivalent to y_pred in our example. To be honest, that looks like it's a bug :) It should have been converted to probability, e.g. tf.nn.softmax(logits). So the correct call would be:

entropy_loss = kls.categorical_crossentropy(tf.nn.softmax(logits), logits, from_logits=True)