Does contrastive loss in mnist_siamese_graph.py follow the original paper?

LinHungShi commented 7 years ago

It seems that the contrastive loss in the moist_siamese_graph.py is defined as

K.mean(y_true * K.square(y_pred) +
                  (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

However, the loss defined in the original paper is

K.mean((1 - y_true) * K.square(y_pred) +
                  y_true * K.square(K.maximum(margin - y_pred, 0)))

where y_true is 0 if two images are from the same class, and 1 otherwise.

StripedBanana commented 7 years ago

I had the same thought when I originally looked at the paper and this implementation, and I still do. I think you are correct about the second formula.

Also, I think there is another really big assumption in compute_accuracy: they fix a threshold at 0.5, but it actually may not be the best value to choose. It somewhat works because the formation of pairs here makes it so you have a balanced number of examples and counter-examples with the same cost in the loss function.

If you really want the best FAR and FRR the example should at least tell the user to test a few values of threshold and plot a ROC.

One last thing: I don't remember what setting of verbose is activated in this example, but beginners should really be careful of the accuracy and val_accuracy metrics during the fitting phase, because they are computed on simple rounding operations (so basically with a threshold of 0.5), so they may actually not reflect the real accuracy at all. However, the loss remains a good metric to look at if your dataset is not too unbalanced.

typewind commented 7 years ago

@StripedBanana No matter how much the threshold is, the compute_accuracy is wrong. I saw there is a fix which gives a correct method of computing the accuracy, but they reverted this fix and I don't know why. For the threshold, I use the corrected contrastive loss function provided by @LinHungShi and draw the scatter plot for the prediction value: Imgur Then I have no idea how to set a simple threshold to divide them...Or maybe I should follow the original paper and do some clustering? (I just a beginner of machine learning, sorry about that)

StripedBanana commented 7 years ago

From your scatter plot it looks like your training phase has done very little to get better results than random guessing. Are you using the mnist dataset and the pairing function from the example? This data should give you much more distinct distributions after a few epochs. Can you post your training phase metrics and parameters? Just copy paste the console output from Keras if you use verbose.

I strongly don't recommend using compute_accuracy if you want to go further than just toying around with this example. It is very unintuitive and you should get way more information with your scatter plot.

I would also recommend not adding accuracy to your metrics, as it suffers the same issue compute_accuracy does. Checking out the loss and val_loss during training will tell you if you start overfitting, if you set your learning rate too high/too low, when to stop training... it should be enough of an indicator as it directly shows you if your predictions are close to your labels (batch-wise, at least)

typewind commented 7 years ago

@StripedBanana The previous plot is a result after 20 epoch. I tried 100 epoch then it looks worse... Imgur

Sorry I am new for Keras. I set verbose=1 in the fit function. Then this is the console output of the last 3 epoch. Is this you need?

Epoch 97/100
   512/10840
 73728/108400 [===================>..........] - ETA:
108400/108400 [==============================] - 5s - loss: 0.2224 - val_loss: 0.2749
Epoch 98/100
  8704/108400 [=>............................] - ETA: 5s - loss: 0.2200
 56320/108400 [==============>...............] - ETA: 2s - loss: 0.2219
108400/108400 [==============================] - 6s - loss: 0.2223 - val_loss: 0.2755
Epoch 99/100
 18944/10840
 27648/108400 [======>.......................] - ETA: 3s - loss: 0.2197
106496/108400 [===============
108400/108400 [==============================] - 5s - loss: 0.2217 - val_loss: 0.2742
Epoch 100/100
 35328/108400 [========>.....................] - ETA: 3s - loss: 0.2202
108400/108400 [==============================] - 5s - loss: 0.2220 - val_loss: 0.2742

StripedBanana commented 7 years ago

Yes this is what I need, but only a few epochs does not help much, although I can already say loss/val_loss>0.2 is very high (I think a loss of 0.25 is basically random guessing given the contrastive loss function we have). So I'm pretty sure we can say your training phase is ineffective.

Can you also tell us what data you use? How do you load it? What is your network? Maybe you only change the contrastive loss expression in the example?

typewind commented 7 years ago

@StripedBanana I use the default MNIST dataset by from keras.datasets import mnist. I didn't change anything from the original code except the contrastive loss function.

Now I feel weird about the eucl_dist_output_shape function as well. The value returned is <class 'tuple'>: (None, 1). I doubt whether it really works, and I feel that's why this network can only do a random guessing. Do you have any clue about it?

StripedBanana commented 7 years ago

@typewind eucl_dist_output_shape returning (None, 1) is ok, it's just required by TensorFlow since there is no shape inference in a model. Basically we are saying that we expect an unknown number of scalar values as the output of the euclidean distance.

Please post your code so we can try to run it and reproduce.

typewind commented 7 years ago

@StripedBanana My code is here. I only modified contrastive_loss(line 37) and compute_accuracy(line 76)

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

lschaupp commented 6 years ago

I'm a bit late to the party. But I had the same thoughts about the accuracy problem. So what is the consensus about this. Which method should be the way to go to get an actual accuracy?

pvskand commented 6 years ago

@LinHungShi If you see the paper on Contrastive Loss, on page 3 before equation (1), they mention that y=0 if the pairs are similar and y=1 if the pairs are dissimilar. But in mnist_siamese_graphy.py it must be the case that y=0 is for dissimilar pairs and y=1 is for the similar pairs as in the case of mnist_siamese.py code.

tonmoyborah commented 5 years ago

@StripedBanana , @pvskand, @lschaupp you guys seem to know about contastive loss and siamese networks. I am stuck on an issue I don't quite understand, would you be kind enough and help me a little? My training runs fine (I am training a keras siamese network as given in the example but on a custom dataset) and the model is working decently. I have noticed that model.predict never returns 0. Does it return the euclidean distance between feature vectors of the query image pair during inference or something else (the documentation is very basic)? It seems to be capped at a minimum of 0.00031622776. This exact number comes up quite frequently when I print(pred[0][0]) where pred = model.predict([image1, image2]). For eg, this number is returned when I give the same image for both image1 and image2 or if I give an image which is basically just glare, etc. Any help is highly appreciated.

tonmoyborah commented 5 years ago

Ok, nevermind, I got it

keras-team / keras

Does contrastive loss in mnist_siamese_graph.py follow the original paper? #7119