keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.89k stars 19.45k forks source link

Mnist siamese example returning wrong accuracy values #4980

Closed Tokukawa closed 3 years ago

Tokukawa commented 7 years ago

I am executing the mnist siamese example straight away from the keras examples from the version 1.2.0. I am getting an accuracy value of

* Accuracy on training set: 0.42%
* Accuracy on test set: 2.64%

instead of the correct values

Gets to 99.5% test accuracy after 20 epochs.

At first I thought at a wrong image dimension ordering and tryed different configuration for keras.json But the problem persist. After inspecting prediction against the true values I found the vector of prediction is shifted by exactly by one, so I checked for possible code typos with some switch between 1 and 0 in the loss function and in the pair construction, but everything seems ok. I am running the code in a mac book pro with python 2.7 and tensorflow 0.12.1 Anyone who experienced the same issue?

Tokukawa commented 7 years ago

The code must be changed in two points:

return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

must be changed into

return K.mean((1 - y_true) * K.square(y_pred) + y_true * K.square(K.maximum(margin - y_pred, 0)))

and

labels += [1, 0]

must be changed into:

labels += [0, 1]

The loss function is not symmetric in switching 0 and 1

junwei-pan commented 7 years ago

Actually we can get test accuracy of 99.5% by reverting the commit: https://github.com/fchollet/keras/pull/2736/commits/7f25a773bafb6fb9ef45ce61f7117996eb48efae.

Tokukawa commented 7 years ago

Yes, indeed is a switching between 0 and 1.

oak-tree commented 7 years ago

Hey @Tokukawa,

I'm getting the same result

108400/108400 [==============================] - 1s - loss: 0.0083 - val_loss: 0.0222
Accuracy on training set: 0.41%
Accuracy on test set: 2.64%

hey @kemaswill, can you explain reverting fix. If I understand correctly, it claims to have an issue with taking the mean of negative numbers? maybe its better to do abs before taking the mean?

@Tokukawa I'll check the switching between 0 to 1

oak-tree commented 7 years ago

After digging, it looks like @Tokukawa is right. There is a switching problem. According to Dimensionality Reduction by Learning an Invariant Mapping Y=0 if the pair deemd simliar and Y=1 the pair deemed disimilar. image

This is exactly what create_paris does. Moreover, the loss function is reverse indeed. The max between 0 and m-D_w should be the loss for dissimilar pairs and therefore the loss function is also reversed in the terms of y_true

stalagmite7 commented 7 years ago

Actually found out something odd about the implementation of the contrastive loss function in the siamese example: I corrected the implementation in my function for contrastive loss according to the paper and this issue, so that my contrastive loss function now reads like this:

def contrastive_loss(y_true, y_pred):

margin = 1

#return K.mean(y_true * K.square(y_pred) +
              #(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

return K.mean((1-y_true) * 0.5 * K.square(y_pred) +
              0.5 * y_true * K.square(K.maximum(margin - y_pred, 0)))`

When I run training on my dataset (images, 10 classes, all loaded into memory before splitting into training/test set, like in the siamese example), this is what my logs look like:

`4000/4000 [==============================] - 12s - loss: 0.1521 - acc: 0.4987 - val_loss: 0.2126 - val_acc: 0.5000 Epoch 2/20 4000/4000 [==============================] - 10s - loss: 0.1383 - acc: 0.5088 - val_loss: 0.1825 - val_acc: 0.5080 Epoch 3/20 4000/4000 [==============================] - 9s - loss: 0.1358 - acc: 0.5252 - val_loss: 0.1333 - val_acc: 0.5760 Epoch 4/20 4000/4000 [==============================] - 9s - loss: 0.1202 - acc: 0.6042 - val_loss: 0.1108 - val_acc: 0.6470 Epoch 5/20 4000/4000 [==============================] - 9s - loss: 0.1097 - acc: 0.6480 - val_loss: 0.1059 - val_acc: 0.6720 Epoch 6/20 4000/4000 [==============================] - 9s - loss: 0.1003 - acc: 0.6878 - val_loss: 0.0961 - val_acc: 0.6630 Epoch 7/20 4000/4000 [==============================] - 10s - loss: 0.0968 - acc: 0.6930 - val_loss: 0.1095 - val_acc: 0.6300 Epoch 8/20 4000/4000 [==============================] - 10s - loss: 0.0870 - acc: 0.7342 - val_loss: 0.1082 - val_acc: 0.6270 Epoch 9/20 4000/4000 [==============================] - 10s - loss: 0.0822 - acc: 0.7355 - val_loss: 0.0990 - val_acc: 0.6980 Epoch 10/20 4000/4000 [==============================] - 9s - loss: 0.0771 - acc: 0.7445 - val_loss: 0.0940 - val_acc: 0.6780 Epoch 11/20 4000/4000 [==============================] - 9s - loss: 0.0698 - acc: 0.7598 - val_loss: 0.0988 - val_acc: 0.6990 Epoch 12/20 4000/4000 [==============================] - 9s - loss: 0.0732 - acc: 0.7405 - val_loss: 0.0918 - val_acc: 0.7130 Epoch 13/20 4000/4000 [==============================] - 9s - loss: 0.0657 - acc: 0.7555 - val_loss: 0.1091 - val_acc: 0.6550 Epoch 14/20 4000/4000 [==============================] - 9s - loss: 0.0636 - acc: 0.7545 - val_loss: 0.0917 - val_acc: 0.6920 Epoch 15/20 4000/4000 [==============================] - 10s - loss: 0.0603 - acc: 0.7553 - val_loss: 0.0867 - val_acc: 0.7170 Epoch 16/20 4000/4000 [==============================] - 10s - loss: 0.0585 - acc: 0.7708 - val_loss: 0.1120 - val_acc: 0.6220 Epoch 17/20 4000/4000 [==============================] - 9s - loss: 0.0571 - acc: 0.7627 - val_loss: 0.0868 - val_acc: 0.7280 Epoch 18/20 4000/4000 [==============================] - 9s - loss: 0.0554 - acc: 0.7620 - val_loss: 0.0914 - val_acc: 0.6920 Epoch 19/20 4000/4000 [==============================] - 9s - loss: 0.0536 - acc: 0.7715 - val_loss: 0.0854 - val_acc: 0.7230 Epoch 20/20 4000/4000 [==============================] - 8s - loss: 0.0516 - acc: 0.7737 - val_loss: 0.0947 - val_acc: 0.6960

As you can see, the loss decreases nicely, and the accuracy increases as well, but the final values are pretty awful in terms of accuracy metrics.

To check against the original setup for contrastive loss (the commented code in my function definition now is the one that runs), my logs look like this:

`4000/4000 [==============================] - 12s - loss: 0.2582 - acc: 0.5107 - val_loss: 0.2503 - val_acc: 0.5000 Epoch 2/20 4000/4000 [==============================] - 10s - loss: 0.2534 - acc: 0.5100 - val_loss: 0.2500 - val_acc: 0.5000 Epoch 3/20 4000/4000 [==============================] - 10s - loss: 0.2531 - acc: 0.5048 - val_loss: 0.2509 - val_acc: 0.5000 Epoch 4/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4962 - val_loss: 0.2500 - val_acc: 0.5000 Epoch 5/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4993 - val_loss: 0.2505 - val_acc: 0.5000 Epoch 6/20 4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.4907 - val_loss: 0.2528 - val_acc: 0.5000 Epoch 7/20 4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.5062 - val_loss: 0.2507 - val_acc: 0.5000 Epoch 8/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.5110 - val_loss: 0.2503 - val_acc: 0.5000 Epoch 9/20 4000/4000 [==============================] - 10s - loss: 0.2516 - acc: 0.5012 - val_loss: 0.2498 - val_acc: 0.5080 Epoch 10/20 4000/4000 [==============================] - 10s - loss: 0.2452 - acc: 0.4632 - val_loss: 0.2343 - val_acc: 0.4270 Epoch 11/20 4000/4000 [==============================] - 10s - loss: 0.2420 - acc: 0.4465 - val_loss: 0.2375 - val_acc: 0.4370 Epoch 12/20 4000/4000 [==============================] - 10s - loss: 0.2297 - acc: 0.4138 - val_loss: 0.2311 - val_acc: 0.4100 Epoch 13/20 4000/4000 [==============================] - 10s - loss: 0.2203 - acc: 0.3795 - val_loss: 0.2248 - val_acc: 0.3850 Epoch 14/20 4000/4000 [==============================] - 10s - loss: 0.2100 - acc: 0.3472 - val_loss: 0.2172 - val_acc: 0.3320 Epoch 15/20 4000/4000 [==============================] - 10s - loss: 0.2015 - acc: 0.3197 - val_loss: 0.2110 - val_acc: 0.3420 Epoch 16/20 4000/4000 [==============================] - 10s - loss: 0.1880 - acc: 0.2850 - val_loss: 0.2219 - val_acc: 0.3260 Epoch 17/20 4000/4000 [==============================] - 10s - loss: 0.1805 - acc: 0.2715 - val_loss: 0.2003 - val_acc: 0.2960 Epoch 18/20 4000/4000 [==============================] - 10s - loss: 0.1695 - acc: 0.2440 - val_loss: 0.1979 - val_acc: 0.3010 Epoch 19/20 4000/4000 [==============================] - 10s - loss: 0.1610 - acc: 0.2320 - val_loss: 0.2021 - val_acc: 0.2760 Epoch 20/20 4000/4000 [==============================] - 10s - loss: 0.1554 - acc: 0.2175 - val_loss: 0.1855 - val_acc: 0.2740

Neither the loss nor the accuracy looks as good as the previous setup, but the final performance here is much better. Does anyone have an explanation for this?

Tokukawa commented 7 years ago

Your dataset is the mnist dataset as in the example?

stalagmite7 commented 7 years ago

No, my MNIST dataset trial of the siamese net (exactly as given in the example, without replacing the contrastive loss function) worked as it was supposed to: I got about 99% acuracy. This is my own dataset of faces that I am training the siamese net on. For a face verification/re-identification purpose. I'm sorry, I should have mentioned that (my experiments) in more detail!

Tokukawa commented 7 years ago

So, why do you are expecting the same accuracy?

stalagmite7 commented 7 years ago

I am not expecting the same accuracy as I had for MNIST, I am curious however about how the implementation of the contrastive loss appears to have different loss trends (with the correct implementation in my first set of logs, you see that the accuracy increases significantly over epochs and the loss decreases, while the accuracy in fact decreases according to the default implementation on Keras - second set of logs). This doesn't explain the final numbers on the overall accuracy over the training and testing dataset, which were identical for these experiments.

stalagmite7 commented 7 years ago

I am asking this because I have a different issue that I am trying to debug, and I wanted to know if this was my error. In case you are curious: #2975 ,my comment at the end

Tokukawa commented 7 years ago

Which is your baseline in your dataset?

stalagmite7 commented 7 years ago

I got Accuracy on training set: 73.81% Accuracy on test set: 69.76%

while using the regular contrastive loss function from the siamese keras example on my dataset. Also, I got the accuracy mentioned in the example with the mnist dataset as well, both with the network in the example, and my network too.

Tokukawa commented 7 years ago

Sorry, I was not clear. What is your random baseline, namely which is the frequency of your most populated class?

stalagmite7 commented 7 years ago

My task here is to say if a pair of images are the same identity or not, so thats a two class binary classification problem: pair or not pair. And the way I generate my pairs is such that there is an equal number of true pairs and false pairs in the dataset. Hope that answers your question!

Tokukawa commented 7 years ago

So your random baseline is 50%. This seems like a bug in your code (may be some minus sign lost somewhere in your code). Anyway you should ask on stackoverflow as this board is for rising issue in keras.

ghost commented 7 years ago

@Tokukawa, @stalagmite7 why those changes that you suggested weren't done yet? I mean, the current code is wrong, right?

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

phobrain commented 7 years ago

https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py has disappeared! Was it defective? I want to point to it as the starting point for my adaptation to histograms, not to mention hear of any improvements.

http://phobrain.com/pr/home/siamese.html

lschaupp commented 6 years ago

hey, I was wondering if anybody could also explain me why we would use a different threshold (0.5) for testing than the margin while training (1). I've assumed that the margin should behave as a threshold for deciding whether two pairs are the same.

ozabluda commented 6 years ago

margin has absolutely nothing to do with the threshold, except that 0 < threshold < margin

lschaupp commented 6 years ago

yes I've figured it as well, after reading into it a bit more.Though right now I'm using a AUC score for the final score (over the whole dataset) after each epoch which should be a better evaluation.

MovsisyanM commented 2 years ago

writing a custom activation function worked for me

def capped_relu(x):
    return tf.minimum(tf.keras.activations.relu(x), 10)

Also make sure the input shape is correct (flatten for DNNs), try different losses (binary crossentropy was causing this issue for me aswell)