Closed Tokukawa closed 3 years ago
The code must be changed in two points:
return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
must be changed into
return K.mean((1 - y_true) * K.square(y_pred) + y_true * K.square(K.maximum(margin - y_pred, 0)))
and
labels += [1, 0]
must be changed into:
labels += [0, 1]
The loss function is not symmetric in switching 0 and 1
Actually we can get test accuracy of 99.5% by reverting the commit: https://github.com/fchollet/keras/pull/2736/commits/7f25a773bafb6fb9ef45ce61f7117996eb48efae.
Yes, indeed is a switching between 0 and 1.
Hey @Tokukawa,
I'm getting the same result
108400/108400 [==============================] - 1s - loss: 0.0083 - val_loss: 0.0222
Accuracy on training set: 0.41%
Accuracy on test set: 2.64%
hey @kemaswill, can you explain reverting
fix. If I understand correctly, it claims to have an issue with taking the mean of negative numbers? maybe its better to do abs
before taking the mean?
@Tokukawa I'll check the switching between 0 to 1
After digging, it looks like @Tokukawa is right. There is a switching problem. According to Dimensionality Reduction by Learning an Invariant Mapping Y=0 if the pair deemd simliar and Y=1 the pair deemed disimilar.
This is exactly what create_paris
does. Moreover, the loss function is reverse indeed. The max
between 0 and m-D_w
should be the loss for dissimilar pairs
and therefore the loss function
is also reversed in the terms of y_true
Actually found out something odd about the implementation of the contrastive loss function in the siamese example: I corrected the implementation in my function for contrastive loss according to the paper and this issue, so that my contrastive loss function now reads like this:
def contrastive_loss(y_true, y_pred):
margin = 1
#return K.mean(y_true * K.square(y_pred) +
#(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
return K.mean((1-y_true) * 0.5 * K.square(y_pred) +
0.5 * y_true * K.square(K.maximum(margin - y_pred, 0)))`
When I run training on my dataset (images, 10 classes, all loaded into memory before splitting into training/test set, like in the siamese example), this is what my logs look like:
`4000/4000 [==============================] - 12s - loss: 0.1521 - acc: 0.4987 - val_loss: 0.2126 - val_acc: 0.5000 Epoch 2/20 4000/4000 [==============================] - 10s - loss: 0.1383 - acc: 0.5088 - val_loss: 0.1825 - val_acc: 0.5080 Epoch 3/20 4000/4000 [==============================] - 9s - loss: 0.1358 - acc: 0.5252 - val_loss: 0.1333 - val_acc: 0.5760 Epoch 4/20 4000/4000 [==============================] - 9s - loss: 0.1202 - acc: 0.6042 - val_loss: 0.1108 - val_acc: 0.6470 Epoch 5/20 4000/4000 [==============================] - 9s - loss: 0.1097 - acc: 0.6480 - val_loss: 0.1059 - val_acc: 0.6720 Epoch 6/20 4000/4000 [==============================] - 9s - loss: 0.1003 - acc: 0.6878 - val_loss: 0.0961 - val_acc: 0.6630 Epoch 7/20 4000/4000 [==============================] - 10s - loss: 0.0968 - acc: 0.6930 - val_loss: 0.1095 - val_acc: 0.6300 Epoch 8/20 4000/4000 [==============================] - 10s - loss: 0.0870 - acc: 0.7342 - val_loss: 0.1082 - val_acc: 0.6270 Epoch 9/20 4000/4000 [==============================] - 10s - loss: 0.0822 - acc: 0.7355 - val_loss: 0.0990 - val_acc: 0.6980 Epoch 10/20 4000/4000 [==============================] - 9s - loss: 0.0771 - acc: 0.7445 - val_loss: 0.0940 - val_acc: 0.6780 Epoch 11/20 4000/4000 [==============================] - 9s - loss: 0.0698 - acc: 0.7598 - val_loss: 0.0988 - val_acc: 0.6990 Epoch 12/20 4000/4000 [==============================] - 9s - loss: 0.0732 - acc: 0.7405 - val_loss: 0.0918 - val_acc: 0.7130 Epoch 13/20 4000/4000 [==============================] - 9s - loss: 0.0657 - acc: 0.7555 - val_loss: 0.1091 - val_acc: 0.6550 Epoch 14/20 4000/4000 [==============================] - 9s - loss: 0.0636 - acc: 0.7545 - val_loss: 0.0917 - val_acc: 0.6920 Epoch 15/20 4000/4000 [==============================] - 10s - loss: 0.0603 - acc: 0.7553 - val_loss: 0.0867 - val_acc: 0.7170 Epoch 16/20 4000/4000 [==============================] - 10s - loss: 0.0585 - acc: 0.7708 - val_loss: 0.1120 - val_acc: 0.6220 Epoch 17/20 4000/4000 [==============================] - 9s - loss: 0.0571 - acc: 0.7627 - val_loss: 0.0868 - val_acc: 0.7280 Epoch 18/20 4000/4000 [==============================] - 9s - loss: 0.0554 - acc: 0.7620 - val_loss: 0.0914 - val_acc: 0.6920 Epoch 19/20 4000/4000 [==============================] - 9s - loss: 0.0536 - acc: 0.7715 - val_loss: 0.0854 - val_acc: 0.7230 Epoch 20/20 4000/4000 [==============================] - 8s - loss: 0.0516 - acc: 0.7737 - val_loss: 0.0947 - val_acc: 0.6960
As you can see, the loss decreases nicely, and the accuracy increases as well, but the final values are pretty awful in terms of accuracy metrics.
To check against the original setup for contrastive loss (the commented code in my function definition now is the one that runs), my logs look like this:
`4000/4000 [==============================] - 12s - loss: 0.2582 - acc: 0.5107 - val_loss: 0.2503 - val_acc: 0.5000 Epoch 2/20 4000/4000 [==============================] - 10s - loss: 0.2534 - acc: 0.5100 - val_loss: 0.2500 - val_acc: 0.5000 Epoch 3/20 4000/4000 [==============================] - 10s - loss: 0.2531 - acc: 0.5048 - val_loss: 0.2509 - val_acc: 0.5000 Epoch 4/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4962 - val_loss: 0.2500 - val_acc: 0.5000 Epoch 5/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4993 - val_loss: 0.2505 - val_acc: 0.5000 Epoch 6/20 4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.4907 - val_loss: 0.2528 - val_acc: 0.5000 Epoch 7/20 4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.5062 - val_loss: 0.2507 - val_acc: 0.5000 Epoch 8/20 4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.5110 - val_loss: 0.2503 - val_acc: 0.5000 Epoch 9/20 4000/4000 [==============================] - 10s - loss: 0.2516 - acc: 0.5012 - val_loss: 0.2498 - val_acc: 0.5080 Epoch 10/20 4000/4000 [==============================] - 10s - loss: 0.2452 - acc: 0.4632 - val_loss: 0.2343 - val_acc: 0.4270 Epoch 11/20 4000/4000 [==============================] - 10s - loss: 0.2420 - acc: 0.4465 - val_loss: 0.2375 - val_acc: 0.4370 Epoch 12/20 4000/4000 [==============================] - 10s - loss: 0.2297 - acc: 0.4138 - val_loss: 0.2311 - val_acc: 0.4100 Epoch 13/20 4000/4000 [==============================] - 10s - loss: 0.2203 - acc: 0.3795 - val_loss: 0.2248 - val_acc: 0.3850 Epoch 14/20 4000/4000 [==============================] - 10s - loss: 0.2100 - acc: 0.3472 - val_loss: 0.2172 - val_acc: 0.3320 Epoch 15/20 4000/4000 [==============================] - 10s - loss: 0.2015 - acc: 0.3197 - val_loss: 0.2110 - val_acc: 0.3420 Epoch 16/20 4000/4000 [==============================] - 10s - loss: 0.1880 - acc: 0.2850 - val_loss: 0.2219 - val_acc: 0.3260 Epoch 17/20 4000/4000 [==============================] - 10s - loss: 0.1805 - acc: 0.2715 - val_loss: 0.2003 - val_acc: 0.2960 Epoch 18/20 4000/4000 [==============================] - 10s - loss: 0.1695 - acc: 0.2440 - val_loss: 0.1979 - val_acc: 0.3010 Epoch 19/20 4000/4000 [==============================] - 10s - loss: 0.1610 - acc: 0.2320 - val_loss: 0.2021 - val_acc: 0.2760 Epoch 20/20 4000/4000 [==============================] - 10s - loss: 0.1554 - acc: 0.2175 - val_loss: 0.1855 - val_acc: 0.2740
Neither the loss nor the accuracy looks as good as the previous setup, but the final performance here is much better. Does anyone have an explanation for this?
Your dataset is the mnist dataset as in the example?
No, my MNIST dataset trial of the siamese net (exactly as given in the example, without replacing the contrastive loss function) worked as it was supposed to: I got about 99% acuracy. This is my own dataset of faces that I am training the siamese net on. For a face verification/re-identification purpose. I'm sorry, I should have mentioned that (my experiments) in more detail!
So, why do you are expecting the same accuracy?
I am not expecting the same accuracy as I had for MNIST, I am curious however about how the implementation of the contrastive loss appears to have different loss trends (with the correct implementation in my first set of logs, you see that the accuracy increases significantly over epochs and the loss decreases, while the accuracy in fact decreases according to the default implementation on Keras - second set of logs). This doesn't explain the final numbers on the overall accuracy over the training and testing dataset, which were identical for these experiments.
I am asking this because I have a different issue that I am trying to debug, and I wanted to know if this was my error. In case you are curious: #2975 ,my comment at the end
Which is your baseline in your dataset?
I got Accuracy on training set: 73.81% Accuracy on test set: 69.76%
while using the regular contrastive loss function from the siamese keras example on my dataset. Also, I got the accuracy mentioned in the example with the mnist dataset as well, both with the network in the example, and my network too.
Sorry, I was not clear. What is your random baseline, namely which is the frequency of your most populated class?
My task here is to say if a pair of images are the same identity or not, so thats a two class binary classification problem: pair or not pair. And the way I generate my pairs is such that there is an equal number of true pairs and false pairs in the dataset. Hope that answers your question!
So your random baseline is 50%. This seems like a bug in your code (may be some minus sign lost somewhere in your code). Anyway you should ask on stackoverflow as this board is for rising issue in keras.
@Tokukawa, @stalagmite7 why those changes that you suggested weren't done yet? I mean, the current code is wrong, right?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py has disappeared! Was it defective? I want to point to it as the starting point for my adaptation to histograms, not to mention hear of any improvements.
hey, I was wondering if anybody could also explain me why we would use a different threshold (0.5) for testing than the margin while training (1). I've assumed that the margin should behave as a threshold for deciding whether two pairs are the same.
margin has absolutely nothing to do with the threshold, except that 0 < threshold < margin
yes I've figured it as well, after reading into it a bit more.Though right now I'm using a AUC score for the final score (over the whole dataset) after each epoch which should be a better evaluation.
writing a custom activation function worked for me
def capped_relu(x):
return tf.minimum(tf.keras.activations.relu(x), 10)
Also make sure the input shape is correct (flatten for DNNs), try different losses (binary crossentropy was causing this issue for me aswell)
I am executing the mnist siamese example straight away from the keras examples from the version 1.2.0. I am getting an accuracy value of
instead of the correct values
At first I thought at a wrong image dimension ordering and tryed different configuration for keras.json But the problem persist. After inspecting prediction against the true values I found the vector of prediction is shifted by exactly by one, so I checked for possible code typos with some switch between 1 and 0 in the loss function and in the pair construction, but everything seems ok. I am running the code in a mac book pro with python 2.7 and tensorflow 0.12.1 Anyone who experienced the same issue?
[x] Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[x] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with: pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).