shiroyagicorp / sitq

Learning to Hash for Maximum Inner Product Search
MIT License
13 stars 0 forks source link

Why Result of get_query_signatures and get_item_signatures is not the Same #5

Open HaoMood opened 4 years ago

HaoMood commented 4 years ago

Dear Author,

Thanks for the code. I am wondering why the result of get_query_signatures and get_item_signatures is not the same. For example,

import numpy as np
from sitq import Sitq

# Create sample dataset
items = np.random.rand(10000, 50)

sitq = Sitq(signature_size=8)
# Learn transformation matrix
sitq.fit(items)  

# Get signatures for items
item_sigs = sitq.get_item_signatures(items)
# Get signature for query
query_sigs = sitq.get_query_signatures(items)

# Result is not the same
print(item_sigs[0])   # It gives: [False False False False  True  True False False]
print(query_sigs[0])  # It gives: [False False  True  True False False False  True]

Best regards,

Hao

slaypni commented 4 years ago

Sorry for my late reply. (since I am no longer a member of the company)

The difference of signatures is because of asymmetrical manner of the computation. Signature for items is calculated differently from signature for query. You can find the detail of the computation (vector transformation) at Simple-LSH paper.