Open T-Lind opened 4 months ago
Hi @T-Lind,
It sounds like you might be looking for an API to quantize vectors using product quantization? Are there any other APIs you are interested in specifically?
We have exposing a product quantization API on our roadmap, as well as scalar and binary quantization.
We have exposing a product quantization API on our roadmap, as well as scalar and binary quantization.
@cjnolet Interesting! I look forward to it! We're preferably looking for something we can run locally that only does PQ.
Looking at integrating it into JVector, actually: https://github.com/jbellis/jvector (Only choosing to do PQ given other processes are more CPU efficient for our data.)
@T-Lind this is great and we would love to support a JVector integration!
Do you have an ideal API experience that you would like us to provide for this? Would an sklearn transformer/encoder estimator-like API be helpful? Are you looking for c++ or want to stick to c? What different data types (input) would you ideally want us to support?
cc @tfeher @achirkin
@cjnolet Looking for C++.
Here's what we need to build a POC so we can figure out what the "ideal" is:
public class PQGpu {
/**
* Load a codebook and a set of encoded vectors into GPU memory. The codebook is a flattened
* set of MxK centroids where M is the number of subspaces and K is the number of centroids per
* subspace.
*/
public static PQGpu loadToGpu(VectorFloat codebook, List<ByteSequence> encodedVectors);
/**
* Perform an approximate distance computation on the GPU. The query vector is compared to all
* ordinals in the given `ordinals` array, and scores are written to the corresponding position
* in the `scores` output vector. (Thus, `scores` is used as a native stand-in for a float[],
* not as a "vector" per se.)
*/
void computeSimilarity(VectorFloat query, int[] ordinals, VectorFloat scores);
}
Our timeline at DataStax may be different but it'd be great if we can shape the API positively in the long run. Particularly as you're working on the move from RAFT to CuVS.
@T-Lind , what about the layout of the encoded vectors? In IVF-PQ, we keep the encoded vectors in groups of 32, in interleaved 16-byte long chunks. This is to allow faster (aligned, vectorized) reading from memory by groups of 32 GPU threads, but it may require some adjustments to the API in your snippet. We had to disable this in CAGRA-Q (and hence there's some code duplication now), because it's always random access reading there, so no speedup from contiguous reads.
@achirkin As long as we can encode more than 32 vectors and choose an arbitrary number of subspaces we don't care how it's stored. Fine if centroids per subspace only supports up to 256.
*assuming there's an encode()
method to add it to your storage, but it looks like there is?
@achirkin @cjnolet would you mind putting together an example C++ program that'd solely do PQ on a set of test vectors? Thanks.
also @tfeher ^
@T-Lind instead of putting together an example that utilizes internal APIs (which can change at any point without notice), I would prefer that we prioritize getting a good PQ implementation exposed through the public APIs.
I suspect an ideal API for this would be to pass in a host or device_matrix_view with an input dataset and get as output 1) the resulting cluster centers (and any other needed params), and 2) the pq compressed input vectors.
An additonal API would take all the PQ params like the centroids, in addition to a device or host matrix view and would return the pq compressed vectors.
Would these APIs be useful for Datastax and JVector?
@cjnolet , I suspect you API suggestion sounds a lot like our existing vpq_build that produces vpq_dataset
@achirkin partially- what about encoding new vectors?
For that we'd need to pry into the implementation of vpq_build
, and then that is also a single call to process_and_fill_codes
@achirkin it sounds like it shouldn’t be too hard to expose this, we just need to agree on a more general API for quantizing vectors.
What about the APIs for computing distances / nearest neighbors on the set of query vectors? Any way we can be clever and compute this while hiding the lookup tables from the user?
Hmm, that code atm is tightly integrated in CAGRA search, so I guess we'd need more work to decouple/expose this. https://github.com/rapidsai/cuvs/blob/branch-24.08/cpp/src/neighbors/detail/cagra/compute_distance_vpq.cuh
@cjnolet that sounds very useful! What kind of timeline are we looking at for that?
@T-Lind we'd like to expose this for our 24.10 release, which is formally released in October, but we will likely have it exposed in the nightlies before then. Does that timeline work for DataStax? Unfortunately, we're in the process of wrapping up our 24.08 release (August) and I just don't to overpromise.
@cjnolet we're making some changes as can be seen in this forked repo: https://github.com/jbellis/cuvs
You're doing this
What we want is
What is your question? Hi!
I'm looking to only run select parts of vector search on GPUs, specifically PQ processes.
Thanks!