Open ahandresf opened 5 years ago
Hello,
Logits are simply unnormalized log probabilities. There's nothing special about them mathematically, but in practice many TensorFlow ops will work slightly faster and will be more stable if you use logits instead of converting them to probabilities (e.g. via softmax).
If we use CE loss signature you've found then for a single sample we can calculate CE as y_true * log(y_pred)
. Since we also pass from_logits=True
parameter then Keras will know that applying the log is not necessary.
Finally, there is the first logits parameter, which is equivalent to y_pred
in our example. To be honest, that looks like it's a bug :) It should have been converted to probability, e.g. tf.nn.softmax(logits)
. So the correct call would be:
entropy_loss = kls.categorical_crossentropy(tf.nn.softmax(logits), logits, from_logits=True)
Hello,
Sorry if maybe this is a silly question but I am not quite sure what is happening in this part of the code. 1) According to my understanding there the entropy term is added into the loss function in such a way that it regularizes so the agent sometimes explore new actions. 2) Looking into the Keras documentation, I find that
keras.losses.categorical_crossentropy(y_true, y_pred)
therefore I am not clear what happened when you send the same thing in both arguments. Now the logits are defined as:self.logits = kl.Dense(num_actions, name='policy_logits')
So, infer logist is equal in size to number of actions and I imagine that is like a probability per every action that somehow you need to estimate the entropy of that group of actions.For the game, it works perfect but I want to modify this code and used in an environment I developed by myself. Hope you can give me some lights in this.
Thanks.