jingtaozhan / RepCONC

WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
MIT License
115 stars 13 forks source link

Unsupervised PQ results #4

Open hshreeshail opened 2 years ago

hshreeshail commented 2 years ago

In table-1, is there any explanation for why the results of unsupervised PQ (MRR@10 = 0.028 @ compression ratio = 64x) so poor?

In our experience, PQ works reasonably well. For example, when we use PQ to compress vectors (768 dim) generated by the ANCE model with M=32 (compression ratio = 96x), we get MRR@10 = 0.252 on MS MARCO Passage Dev. We used IndexPQ from the FAISS library for the same.

Also, when reporting results on unsupervised methods (PQ, ScaNN, OPQ etc), what is the encoder of input uncompressed vectors? Is it the trained STAR model?

jingtaozhan commented 2 years ago

[1] According to my experience, PQ performs rather poorly and it is important to use OPQ. This is the script we use for OPQ. You can change this line to get the PQ results. Happy to know how you get such better PQ performance. [2] Yes, the encoder is STAR model.