thuml / HashNet

Code release for "HashNet: Deep Learning to Hash by Continuation" (ICCV 2017)
MIT License
241 stars 84 forks source link

could you do me a favor? #6

Closed llf1234 closed 6 years ago

llf1234 commented 6 years ago

i am a beginner in deep hash, there are two questions I can not understand, 1 why do not use classification results as hashcode? where are the shortcomings? 2 i find almost all supervised deep hash methods use alexnet as backbone network ,is it allowed to change this network when i making papers Please favor me with your instructions。 thank you very much!

caozhangjie commented 6 years ago

For the first question, in retrieval setting, you do not have any class information but only have similarity information. Thus, you cannot get classification results.

For the second question, you can use other backbone networks such as VGG, ResNet in your paper and remember to run other methods under the same backbone network to compare.

On Wed, Apr 4, 2018 at 9:28 PM, llf1234 notifications@github.com wrote:

i am a beginner in deep hash, there are two questions I can not understand, 1 why do not use classification results as hashcode? where are the shortcomings? 2 i find almost all supervised deep hash methods use alexnet as backbone network ,is it allowed to change this network when i making papers Please favor me with your instructions。 thank you very much!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thuml/HashNet/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AKZo2fViRFcgqYWU64H0cJqoFG-4JYC_ks5tlMqbgaJpZM4TGzCQ .

yuewuqing2224 commented 6 years ago

I want to add some of my own thought on this.

The overall goal of image hashing is to map each image to a binary hash code so that ones that are similar shares similar hash codes and ones that are different have different hash codes. The distance is measured in Hamming distance. Binary codes are usually much shorter than the number of classes. This ensures fast retrieval speed since distance between binary codes can be efficiently computed using bitwise xor and smaller hash codes will further boost the speed. So two things that people should care about: (1) image similarity (2) hash code learning

Most papers nowadays are more concerned with improving the retrieval speed and accuracy based on commonly used datasets. This means that (2) are more relevant. (1) is simply based on some common practice in the field. If you look at those datasets, you will notice that they all have class label informations. Even if some paper do not use this directly and formulate it as similarity matrix, you should know that whether images are similar or not will always be computed from class label. For instance in imagenet100 and cifar10, this is simply the class lable. In NUS WIDE and MS COCO, this is whether two images share at least one common class label or not.

And as a side note, the pretrained model used for image hashing will always be from some pretrained Imagenet classification model. You never see people report training from scratch. This is because many simply do not converge. So I would say that nowadays most image hashing models work by projecting one hot vector or multiclass label into a much shorter binary hash codes. This is different from classification because softmax results or some intermediate conv features as used in face recognition are all floating values. You usually calculate the max or the Euclidean distance. Results from them can not be directly mapped to binary codes without fine-tune with image hashing algorithm.

With that being said, if you are interested in (1), you can check out papers like context embedding network or contextual visual similarity. If you don't care about it, then feel free to use any class label information you want if it gives you good results. Many papers do use it to boost their performance and some even strictly use it.