Closed LinHungShi closed 7 years ago
I had the same thought when I originally looked at the paper and this implementation, and I still do. I think you are correct about the second formula.
Also, I think there is another really big assumption in compute_accuracy
: they fix a threshold at 0.5, but it actually may not be the best value to choose. It somewhat works because the formation of pairs here makes it so you have a balanced number of examples and counter-examples with the same cost in the loss function.
If you really want the best FAR and FRR the example should at least tell the user to test a few values of threshold and plot a ROC.
One last thing: I don't remember what setting of verbose is activated in this example, but beginners should really be careful of the accuracy
and val_accuracy
metrics during the fitting phase, because they are computed on simple rounding operations (so basically with a threshold of 0.5), so they may actually not reflect the real accuracy at all. However, the loss remains a good metric to look at if your dataset is not too unbalanced.
@StripedBanana No matter how much the threshold is, the compute_accuracy
is wrong. I saw there is a fix which gives a correct method of computing the accuracy, but they reverted this fix and I don't know why.
For the threshold, I use the corrected contrastive loss function provided by @LinHungShi and draw the scatter plot for the prediction value:
Then I have no idea how to set a simple threshold to divide them...Or maybe I should follow the original paper and do some clustering? (I just a beginner of machine learning, sorry about that)
From your scatter plot it looks like your training phase has done very little to get better results than random guessing. Are you using the mnist dataset and the pairing function from the example? This data should give you much more distinct distributions after a few epochs. Can you post your training phase metrics and parameters? Just copy paste the console output from Keras if you use verbose
.
I strongly don't recommend using compute_accuracy
if you want to go further than just toying around with this example. It is very unintuitive and you should get way more information with your scatter plot.
I would also recommend not adding accuracy
to your metrics, as it suffers the same issue compute_accuracy
does. Checking out the loss
and val_loss
during training will tell you if you start overfitting, if you set your learning rate too high/too low, when to stop training... it should be enough of an indicator as it directly shows you if your predictions are close to your labels (batch-wise, at least)
@StripedBanana The previous plot is a result after 20 epoch. I tried 100 epoch then it looks worse...
Sorry I am new for Keras. I set verbose=1
in the fit function. Then this is the console output of the last 3 epoch. Is this you need?
Epoch 97/100
512/10840
73728/108400 [===================>..........] - ETA:
108400/108400 [==============================] - 5s - loss: 0.2224 - val_loss: 0.2749
Epoch 98/100
8704/108400 [=>............................] - ETA: 5s - loss: 0.2200
56320/108400 [==============>...............] - ETA: 2s - loss: 0.2219
108400/108400 [==============================] - 6s - loss: 0.2223 - val_loss: 0.2755
Epoch 99/100
18944/10840
27648/108400 [======>.......................] - ETA: 3s - loss: 0.2197
106496/108400 [===============
108400/108400 [==============================] - 5s - loss: 0.2217 - val_loss: 0.2742
Epoch 100/100
35328/108400 [========>.....................] - ETA: 3s - loss: 0.2202
108400/108400 [==============================] - 5s - loss: 0.2220 - val_loss: 0.2742
Yes this is what I need, but only a few epochs does not help much, although I can already say loss/val_loss>0.2 is very high (I think a loss of 0.25 is basically random guessing given the contrastive loss function we have). So I'm pretty sure we can say your training phase is ineffective.
Can you also tell us what data you use? How do you load it? What is your network? Maybe you only change the contrastive loss expression in the example?
@StripedBanana I use the default MNIST dataset by from keras.datasets import mnist
. I didn't change anything from the original code except the contrastive loss function.
Now I feel weird about the eucl_dist_output_shape
function as well. The value returned is <class 'tuple'>: (None, 1)
. I doubt whether it really works, and I feel that's why this network can only do a random guessing. Do you have any clue about it?
@typewind eucl_dist_output_shape
returning (None, 1)
is ok, it's just required by TensorFlow since there is no shape inference in a model. Basically we are saying that we expect an unknown number of scalar values as the output of the euclidean distance.
Please post your code so we can try to run it and reproduce.
@StripedBanana My code is here. I only modified contrastive_loss
(line 37) and compute_accuracy
(line 76)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I'm a bit late to the party. But I had the same thoughts about the accuracy problem. So what is the consensus about this. Which method should be the way to go to get an actual accuracy?
@LinHungShi If you see the paper on Contrastive Loss, on page 3 before equation (1)
, they mention that y=0
if the pairs are similar
and y=1
if the pairs are dissimilar
. But in mnist_siamese_graphy.py it must be the case that y=0
is for dissimilar
pairs and y=1
is for the similar pairs
as in the case of mnist_siamese.py code.
@StripedBanana , @pvskand, @lschaupp you guys seem to know about contastive loss and siamese networks. I am stuck on an issue I don't quite understand, would you be kind enough and help me a little? My training runs fine (I am training a keras siamese network as given in the example but on a custom dataset) and the model is working decently. I have noticed that model.predict
never returns 0. Does it return the euclidean distance between feature vectors of the query image pair during inference or something else (the documentation is very basic)? It seems to be capped at a minimum of 0.00031622776
. This exact number comes up quite frequently when I print(pred[0][0])
where pred = model.predict([image1, image2])
. For eg, this number is returned when I give the same image for both image1
and image2
or if I give an image which is basically just glare, etc. Any help is highly appreciated.
Ok, nevermind, I got it
It seems that the contrastive loss in the moist_siamese_graph.py is defined as
However, the loss defined in the original paper is
where y_true is 0 if two images are from the same class, and 1 otherwise.