keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
60 stars 28 forks source link

Categorical Cross Entropy normalization issue ? #409

Open PierrickPochelu opened 2 years ago

PierrickPochelu commented 2 years ago

In categorical_crossentropy, I suspect this normalization line to be not useful and leads to 2 unexpected behaviors https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8ef/keras/backend.py#L5540

The upper block handle logits and returns something if it is a logit. Therefore, L5540 is called when it is NOT a logit (sum of proba=1). AFAIK the values are already normalized here so L5540 is not useful.

Let’s consider the normalization L5540 is useful and we kept it unchanged. It produces an error in those two edge cases:

I may contribute and push a PR

lucasdavid commented 2 years ago

tf.math.divide_no_nan would fix the first edge case. However, assuming a correct usage of the API, I'm not sure it would ever happen in the first place: it's impossible for the softmax function to produce an output [0., 0., 0.] because $e^x > 0, \forall x\in \mathbb{R}$.

sushreebarsa commented 2 years ago

@PierrickPochelu Could you refer to the comment above and let us know if it helps? Thank you!

PierrickPochelu commented 2 years ago

yes it is

sushreebarsa commented 2 years ago

@PierrickPochelu Thank you for the confirmation!

sushreebarsa commented 1 year ago

@PierrickPochelu Could you please confirm if this issue is resolved? Thank you!

PierrickPochelu commented 1 year ago

reduce_sum should be called only when it is mandatory.

By removing this line I reduce the computing time from 16 sec to 12 sec without changing the output: https://colab.research.google.com/drive/1GFXQxfw_4gwMU6vIo8fNDPGCwnbOSp2o?usp=sharing

To make the benchmark I take an example similar to Imagenet dimensions: 1 million samples (1e5 called 10 times due to memory limit) and 1 thousand classes.

PierrickPochelu commented 1 year ago

PR: https://github.com/keras-team/keras/pull/17140

divyashreepathihalli commented 1 year ago

reduce_sum should be called only when it is mandatory.

By removing this line I reduce the computing time from 16 sec to 12 sec without changing the output: https://colab.research.google.com/drive/1GFXQxfw_4gwMU6vIo8fNDPGCwnbOSp2o?usp=sharing

To make the benchmark I take an example similar to Imagenet dimensions: 1 million samples (1e5 called 10 times due to memory limit) and 1 thousand classes.

can you please give public access to the colab?

PierrickPochelu commented 1 year ago

Done

divyashreepathihalli commented 1 year ago

@PierrickPochelu - Th PR has been reviewed - https://github.com/keras-team/keras/pull/17140