netrack / keras-metrics

Metrics for Keras. DEPRECATED since Keras 2.3.0
MIT License
165 stars 23 forks source link

Average recall returns zero value for a single class #36

Closed ybubnov closed 5 years ago

ybubnov commented 5 years ago

Given the following model:

y_pred = [[0], [1], [1]]

model = keras.models.Sequential()
model.add(keras.layers.Lambda(lambda x: x+y_pred))
model.compile(optimizer="sgd", loss="binary_crossentropy",
              metrics=[km.binary_average_recall(classes=1)])

x = numpy.array([[0],[0],[0]])
y = numpy.array([0, 1, 1])

model.fit(x, y, epochs=1)
AR = model.evaluate(x, y)[1:]
print(AR)

The expected average recall is 1.0, while the output is 0.0

ybubnov commented 5 years ago

Hi @HawkinsZhao, I'll appreciate if you'll assist me with this issue.

YiqinZhao commented 5 years ago

OK, I will take a look.

YiqinZhao commented 5 years ago

Hi @ybubnov. I did a quick test with your code. I really can't figure out where goes wrong. But I found something interesting.

If you use a single callback like:

class KMCallback(Callback):
    def on_epoch_end(self, epochs, logs=None):
        x = numpy.array([[0], [0], [0]])
        y = numpy.array([0, 1, 1])
        AR = model.evaluate(x, y, verbose=0)[1:]
        print(AR)

You will get the right average recall value of each epoch. However, the training log shows both incorrect loss and metric. This is not only happening on average recall but also on others, even the built-in Keras metric. Besides, you could also get the correct value with batch_size=1. Do you think this could be a bug of Keras itself?

Check out my code and logs:

import numpy
import keras

import keras_metrics as km
from keras.callbacks import Callback

class KMCallback(Callback):
    def on_epoch_end(self, epochs, logs=None):
        x = numpy.array([[0], [0], [0]])
        y = numpy.array([0, 1, 1])
        AR = model.evaluate(x, y, verbose=0)[1:]
        print('Average Recall: ', AR)

y_pred = [[0], [1], [1]]

model = keras.models.Sequential()
model.add(keras.layers.Lambda(lambda x: x+y_pred))
model.compile(optimizer="sgd", loss="binary_crossentropy",
              metrics=[
                  'accuracy',
                  km.true_positive(),
                  km.binary_average_recall(classes=1)
                ])

x = numpy.array([[0], [0], [0]])
y = numpy.array([0, 1, 1])

model.fit(x, y, epochs=10, batch_size=3, callbacks=[KMCallback()])
Using TensorFlow backend.
Tensor("metrics/average_recall/Cast:0", shape=(?,), dtype=float64) Tensor("metrics/average_recall/Cast_1:0", shape=(3,), dtype=float64)
2019-03-21 21:41:47.659756: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2019-03-21 21:41:47.660136: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Epoch 1/10
3/3 [==============================] - 0s 54ms/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 2/10
3/3 [==============================] - 0s 456us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 3/10
3/3 [==============================] - 0s 372us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 4/10
3/3 [==============================] - 0s 422us/step - loss: 1.1281e-07 - acc: 1.0000 - true_positive: 2.0000 - average_recall: 1.0000
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 5/10
3/3 [==============================] - 0s 546us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 6/10
3/3 [==============================] - 0s 439us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 7/10
3/3 [==============================] - 0s 395us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 8/10
3/3 [==============================] - 0s 405us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 9/10
3/3 [==============================] - 0s 409us/step - loss: 10.6868 - acc: 0.3333 - true_positive: 1.0000 - average_recall: 0.0000e+00
Average Recall:  [1.0, 2, 0.9999999000000099]
Epoch 10/10
3/3 [==============================] - 0s 396us/step - loss: 1.1281e-07 - acc: 1.0000 - true_positive: 2.0000 - average_recall: 1.0000
Average Recall:  [1.0, 2, 0.9999999000000099]
ybubnov commented 5 years ago

Yes, these numbers can't be trusted, but anyways, the resulting loss and metrics has correct values. What I could understand, that current implementation of average recall can't handle a model with a single class: classes=1. The question is: should it handle that?

My answer that with single class, average recall should be equal to the "regular" recall, which is not true currently.

YiqinZhao commented 5 years ago

Thanks for the clarification.

What I could understand, that current implementation of average recall can't handle a model with a single class: classes=1. The question is: should it handle that?

I don't think classes=1 make sense. A classification problem will give at least two classes. Literally, there is no such situation like classes=1.

The current average recall implementation firstly calculates recall for each class, then average them. It behaves like the sklearn recall_score with average='macro'.

Anyway, if you are thinking about making keras-metric API simpler, maybe we can add an argument to the "regular" recall function called average. If it equals True, we route the calculation to my code, while it equals False, we use "regular" calculation for a target label.

ybubnov commented 5 years ago

My suggestion is to add a validation on metric creation that throws an exception when the incorrect number of classes are specified.

ybubnov commented 5 years ago

Fixed in #38