Open jbellis opened 4 months ago
You're right. vpq_dataset
and train_pq
are parts of the CAGRA implementation. In contrast to IVF-PQ, the implementation of compression in CAGRA uses just one codebook for all subspaces, by design.
As far as I understand, this greatly reduces the GPU shared memory / cache requirements and, through that, improves the performance. The product quantization in vpq_dataset
is applied on the residuals of the vector quantization (k-means cluster centers); I assume the drop in recall is partially mitigated by increasing the number of vectors (parameter vq_n_centers
).
Thanks! I had not noticed that cagra_q_dataset_descriptor_t::set_smem_ptr assumes that the entire codebook needs to fit into shared memory. Given that constraint, the "collapsing" here makes sense.
It's possible that I'm missing something due to unfamiliarity with the codebase, but it looks to me like vpq_dataset and train_pq are collapsing all the subspaces into a single codebook. E.g. if you have pq_n_centers=256, pq_len=8, and dim=128, a classic PQ implementation would result in a codebook of
256*8*32
, with centers for each 8-dim subspace trained separately. Instead, cuvs is training just256*8
which probably works okay for extremely symmetrical datasets but will diminish accuracy unnecessarily with others.