Open beyarkay opened 1 year ago
@beyarkay, I tried to reproduce the mentioned code on tensorflow v2.12. Kindly find the gist of it here and could you please confirm the output was similar at your end aswell. Thank you!
Yes, I can confirm that the programs are the same. Both of them:
Do you know if the coercion from logits to probabilities is intended behaviour or not? It seems like a mistake to me, but I'm not sure about the history of the function (and git blame
wasn't particularly enlightening)
@tilakrayal any chance on an update on this issue before the weekend?
Thanks for reporting the issue.
AFAIK, We don't have explicit check/validation for y_pred
probabilities sum to 1, but it is expected to provide the probability distribution values.
In general use cases for example when the softmax
activation function is used in your model, it gives the probability distribution.
Consider below example:
import tensorflow as tf
import numpy as np
y_true = [[0, 1, 0], [0, 0, 1]]
y_logits = [[0.15, 0.05, 0], [0.01, 0.08, 0.01]]
y_pred = tf.keras.layers.Softmax()(np.array(y_logits)).numpy()
print(y_pred)
[[0.36159232 0.32718226 0.3112254 ]
[0.32546702 0.34906602 0.32546702]]
The issue I have is more that I can pass data to tf.keras.losses.categorical_crossentropy
which is obviously incorrect, but no errors are raised.
If you ran a store that accepts USD and I tried to pay you in INR, you would check the currency and tell me to go exchange my money. You definitely wouldn't take 100 INR, tell yourself that it's the same as 100 USD because I forgot to tell you that I'm giving you INR, and then let me buy whatever I want that's worth 100 USD.
But right now, keras
is happily taking my input, sees that they don't sum to 1 (so definitely aren't valid probabilities), but nonetheless will coerce them into probabilities because the user didn't check the box that says "these aren't probabilities". And then the user doesn't even get a warning saying that the coercion was made.
People make mistakes. I can understand not wanting to do some checks in the library that the user really should be doing themselves, that will just cause the complexity of the codebase to explode. But when the code takes the input and forces it to sum to 1, it is doing the worst of both worlds. It's not checking that the user's input was correct, but it's also succeeding when the user's input was incorrect.
Would it be possible to remove line https://github.com/keras-team/keras/blob/master/keras/backend.py#L5589 and raise an error if the probabilities do not sum to 1? I'm happy to write the PR myself, I just want to know that it won't be shot down if I do so.
Hi @beyarkay, while waiting the Team's response let me provide some inputs.
I think this requires much more effort rather than adding a simple assertion. Here are some of the errors you can get:
1) OperatorNotAllowedInGraphError, using a symbolic tf.Tensor
as a Python bool
is not allowed
2) Using a symbolic tf.Tensor
as a Python bool
is not allowed
3) RaggedTensor object has no len()
So checking and catching mistakes in graph mode is a challenging problem, and RaggedTensor function calls works different (as far as I know).
Hi @Frightera thanks for raising the subtleties that I missed. I do agree that the proof of concept assert
I had would not be good for a production ready system. There absolutely are errors that will need to be handled, but I wanted to know from the keras team if they would merge a PR that solved this before I spent the time implementing a battle-ready fix for this issue.
The reason why I'm hesitant is because the original code had a comment explicitly describing this behaviour:
# Adjust the predictions so that the probability of
# each class for every sample adds up to 1
# This is needed to ensure that the cross entropy is
# computed correctly.
output = output / tf.reduce_sum(output, axis, True)
Which makes me think that this might be a feature, and not a bug. While I'm not sure what the feature would be, I'm eager to hear from the keras team about whether or not this coercion is intended behaviour.
Once I get confirmation that they would (theoretically) merge a fix, then I'll start work on a keras
-native way to check that the user isn't passing logits in without specifying from_logits=True
.
Hi @beyarkay Thank you so much for reporting this. Keras team has looked into it and would not go ahead with this at this point in time as this is not a bug as mentioned.
Thanks for the response, could you clarify why this is not considered to be unexpected behaviour?
@beyarkay , based on the comment here https://github.com/keras-team/tf-keras/issues/47 and here https://github.com/keras-team/tf-keras/issues/47 this is not considered as bug.
It seems like @Frightera 's comment just means that the implementation will be difficult, and has nothing to do with whether this is a bug or not.
And then for this comment:
AFAIK, We don't have explicit check/validation for y_pred probabilities sum to 1, but it is expected to provide the probability distribution values.
I don't understand why this means that the coercion isn't a bug. Either the library should validate that the inputs to a function are correct or a library should do no validation but make it clear to the caller that they need to check the inputs themselves.
Keras seems to be doing the worst of both worlds, where good inputs are pointlessly put through an operation to ensure they sum to 1 (even though they should already sum to 1) and bad inputs are silently coerced to sum to 1.
Would the maintainers be open to removing this line
output = output / tf.reduce_sum(output, axis, True)
So that bad inputs will raise an error and good inputs will be slightly faster to process?
Would the maintainers be open to removing this line
output = output / tf.reduce_sum(output, axis, True)
So that bad inputs will raise an error and good inputs will be slightly faster to process?
This might break some edge cases where one might want to calculate the cross-entropy using just part of the post-softmax probability vector (instead of the usual calculation on the entire softmax output) i.e. the probabilities in that vector do not sum up to 1.
Normalizing the vector is still required to calculate the proper cross-entropy, and assuming the inputs are logits simply because they don't sum up to 1 will also result in a wrong result.
I'm not sure about the implementation of something like this, but maybe checking for the presence of a softmax activation during compilation followed by a warning if it appears incompatible with the loss might be another possible solution?
This might break some edge cases where one might want to calculate the cross-entropy using just part of the post-softmax probability vector (instead of the usual calculation on the entire softmax output) i.e. the probabilities in that vector do not sum up to 1.
I'm not 100% sure about the math, but as far as I can tell this isn't a mathematically valid thing to do? (Please help me understand/point me in the right direction if I am wrong here).
Normalizing the vector is still required to calculate the proper cross-entropy
Apologies again, but I don't understand this. What are the circumstances where the vector does not sum to one but calculating its cross-entropy is a valid thing to do and so we would want to normalize the vector?
System information.
unknown 2.10.0
Python 3.10.5
Describe the problem.
The code for cross entropy will silently coerce the
output
vector so that it will sum to one. It is very easy to forget about thefrom_logits
parameter (which is false by default), and mistakenly pass in logits to cross entropy which don't sum to one but which get silently scaled to sum to one regardless.The code snippet in question: https://github.com/keras-team/keras/blob/master/keras/backend.py#L5589
This sort of error is easy to catch (by just checking that the output isn't outrageously far from summing to 1) and would solve a lot of common mistakes with keras beginners.
Describe the current behavior.
If I mistakenly pass logits to cross entropy loss and forget about the
from_logits
parameter, I will get no error but will get incorrect resultsDescribe the expected behavior.
If I pass something to cross entropy loss that's obviously not probabilities, it's probably a user error and the command should fail.
Contributing.
output
sums to a value that isn't within a tolerance to 1.Standalone code to reproduce the issue.
Thanks!