liyunqianggyn / Deep-Unsupervised-Image-Hashing

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy
80 stars 9 forks source link

about the loss function for image features and hash codes #4

Closed Dreamupers closed 3 years ago

Dreamupers commented 3 years ago

The images features are extracted by vgg16 networks which are activated by ReLu, and the range is 0 to +inf. So, the cosine similarity between features is from 0 to 1. However, the hash codes are +1 or -1 which cosine similarity is from -1 to 1. The loss function is mseloss written as : /ImageHashing/Flickr25k.py:101

target_b = F.cosine_similarity(b[:int(labels.size(0) / 2)], b[int(labels.size(0) / 2):])
target_x = F.cosine_similarity(x[:int(labels.size(0) / 2)], x[int(labels.size(0) / 2):])
loss = F.mse_loss(target_b, target_x)

I’m not sure if there is a problem here, because the numerical ranges of the target_b and target_x are not the same.

liyunqianggyn commented 3 years ago

Good observation! Sloving numerical ranges problem could probably improve the model further. To faily compare with other method, we did not change the loss as used in GreedyHash. I agree that the different numerical ranges could be a problem.

Dreamupers commented 3 years ago

Thank you for your quick reply. I have also noticed that in greedyhash. I will try this small correction and compare the performance of the model. Thank you again!

KinglittleQ commented 3 years ago

The last layer before quantization is a fully connected layer without ReLU. The numerical range of the output is -inf to +inf.

https://github.com/liyunqianggyn/Deep-Unsupervised-Image-Hashing/blob/2442d69daf41b32be6f8b5aee3c05f14bcce8063/ImageHashing/Flickr25k.py#L56-L65