sujitpal / holiday-similarity

Finding similar images in the Holidays dataset
Apache License 2.0
103 stars 48 forks source link

contrastive loss function in 02-holidays-siamese-network.ipynb #2

Open ellen-liu opened 6 years ago

ellen-liu commented 6 years ago

I don't think I completely understand this part of the code:

distance = Lambda(cosine_distance, 
                  output_shape=cosine_distance_output_shape)([vector_left, vector_right])

fc1 = Dense(128, kernel_initializer="glorot_uniform")(distance)
fc1 = Dropout(0.2)(fc1)
fc1 = Activation("relu")(fc1)

pred = Dense(2, kernel_initializer="glorot_uniform")(fc1)
pred = Activation("softmax")(pred)

Where does the contrastive divergence loss come in? I'm trying to understand siamese networks conceptually right now and I'm not sure if my assumptions are correct at this point.

sujitpal commented 6 years ago

Hi @ellen-liu sorry my docs and naming are misleading. I used the mnist_siamese.py code in keras/examples as my template, but I framed my own problem with the holiday photos as a 2-class classification problem rather than a regression problem.

In the example, the Lambda computes the final distance between the two images as a continuous value. In my code, the Lambda does an element-wise multiplication between the two image vectors returning a vector of the same size as the combined image vector. Here the name "cosine_distance" is incorrect and misleading, I started with that initially but changed the implementation midway and forgot to change the name. Intuition with product is that it will magnify places where the images are similar to each other. This product vector is then fed into a 2 layer network to produce a 2-class prediction.

sujitpal commented 6 years ago

Also, I think my example may not be actually a Siamese network, since there is no weight sharing. In retrospect, what I should have done is something as described in this TripAdvisor Engineering blog post (scroll down to Model Architecture to see the architecture diagram). Here they use pre-trained networks as I did, but the weights in the 3 layer FCN head is shared. Although not explicitly stated, the caption states that the objective is to maximize difference between outputs at merge, so I suspect that the left and right instances of the FCN model will feed into a Lambda layer which will try to optimize the contrastive loss so similar images will return values closer to 1 and dissimilar images will return values closer to 0.

There is also this SO page which provides a Tensorflow implementation of contrastive loss.

Thank you for bringing up this question, I will try out these ideas and put up a new notebook with the implementation soon, then close out this issue.