Recall on SIFT1M under 32 bits is low

hellozting commented 6 years ago

Hi,

Thanks for sharing the code. Recently I am running comparison experiments.

It works well under 64 bits and 128 bits and the results are comparable with LSQ paper. But under 32 bits, the recall is lower than ckmeans. Although LSQ paper didn't have results for 32 bits, can you please check the results for me? Or is there anything I missed?

recalls under 32 bits: r@1 = 5.09% r@10 = 23.53% r@100 = 62.44%

I changed the hyperparameter npert from 4 to 3, 2, and 1, but similar results are obtained.

Thanks,

una-dinosauria commented 6 years ago

That seems correct to me.

Our method gives up one codebook to store the norm of the approximation. for lower bit rates this is more extreme and harder to recover from.

At 128 bits, the ratio of codebooks to PQ/OPQ/CQ is 15:16. This seems to be fine.
At 64 bits, the ratio is 7:8, and our method seems to be fine as well
At 32 bits, the ratio is 3:4, and this seems to be too much for our method.

You could, for example, run a final experiment with only 16 bits, and our method would come down to k-means and an extra table of constants.

The original AQ paper showed that 32 bits were a strong use case of the method, but only if they you do not give up the extra codebook, which unfortunately comes at an increased query time. For 32 bits this would be 4+3+2+1 = 10 lookups per vector. In some applications such query time might be acceptable, but the comparison is then not 1-to-1 with PQ/OPQ.

If you want to calculate dot products, instead of Euclidean distance, then our method can use the full bit budget for the approximation, and it should be better than most baselines -- definitely better than PQ and OPQ.

Cheers,

hellozting commented 6 years ago

Thanks a lot!

una-dinosauria / local-search-quantization

Recall on SIFT1M under 32 bits is low #5