[QST] Is it possible to interact with lower-level functions in CuVS?

T-Lind commented 2 months ago

What is your question? Hi!

I'm looking to only run select parts of vector search on GPUs, specifically PQ processes.

Are these directly available through the API?
What functions should I be looking at for dealing with PQ specifically? It looks like this file should be the one of interest, if I'm not mistaken?

Thanks!

cjnolet commented 2 months ago

Hi @T-Lind,

It sounds like you might be looking for an API to quantize vectors using product quantization? Are there any other APIs you are interested in specifically?

We have exposing a product quantization API on our roadmap, as well as scalar and binary quantization.

T-Lind commented 2 months ago

We have exposing a product quantization API on our roadmap, as well as scalar and binary quantization.

@cjnolet Interesting! I look forward to it! We're preferably looking for something we can run locally that only does PQ.

T-Lind commented 2 months ago

Looking at integrating it into JVector, actually: https://github.com/jbellis/jvector (Only choosing to do PQ given other processes are more CPU efficient for our data.)

cjnolet commented 2 months ago

@T-Lind this is great and we would love to support a JVector integration!

Do you have an ideal API experience that you would like us to provide for this? Would an sklearn transformer/encoder estimator-like API be helpful? Are you looking for c++ or want to stick to c? What different data types (input) would you ideally want us to support?

cjnolet commented 2 months ago

cc @tfeher @achirkin

T-Lind commented 2 months ago

@cjnolet Looking for C++.

Here's what we need to build a POC so we can figure out what the "ideal" is:

public class PQGpu {
    /**
     * Load a codebook and a set of encoded vectors into GPU memory.  The codebook is a flattened
     * set of MxK centroids where M is the number of subspaces and K is the number of centroids per
     * subspace.
     */
    public static PQGpu loadToGpu(VectorFloat codebook, List<ByteSequence> encodedVectors);

    /**
     * Perform an approximate distance computation on the GPU.  The query vector is compared to all
     * ordinals in the given `ordinals` array, and scores are written to the corresponding position 
     * in the `scores` output vector.  (Thus, `scores` is used as a native stand-in for a float[], 
     * not as a "vector" per se.)
     */
    void computeSimilarity(VectorFloat query, int[] ordinals, VectorFloat scores);
}

T-Lind commented 2 months ago

Our timeline at DataStax may be different but it'd be great if we can shape the API positively in the long run. Particularly as you're working on the move from RAFT to CuVS.

achirkin commented 2 months ago

@T-Lind , what about the layout of the encoded vectors? In IVF-PQ, we keep the encoded vectors in groups of 32, in interleaved 16-byte long chunks. This is to allow faster (aligned, vectorized) reading from memory by groups of 32 GPU threads, but it may require some adjustments to the API in your snippet. We had to disable this in CAGRA-Q (and hence there's some code duplication now), because it's always random access reading there, so no speedup from contiguous reads.

T-Lind commented 2 months ago

@achirkin As long as we can encode more than 32 vectors and choose an arbitrary number of subspaces we don't care how it's stored. Fine if centroids per subspace only supports up to 256.

T-Lind commented 2 months ago

*assuming there's an encode() method to add it to your storage, but it looks like there is?

T-Lind commented 2 months ago

@achirkin @cjnolet would you mind putting together an example C++ program that'd solely do PQ on a set of test vectors? Thanks.

T-Lind commented 2 months ago

also @tfeher ^

cjnolet commented 2 months ago

@T-Lind instead of putting together an example that utilizes internal APIs (which can change at any point without notice), I would prefer that we prioritize getting a good PQ implementation exposed through the public APIs.

I suspect an ideal API for this would be to pass in a host or device_matrix_view with an input dataset and get as output 1) the resulting cluster centers (and any other needed params), and 2) the pq compressed input vectors.

An additonal API would take all the PQ params like the centroids, in addition to a device or host matrix view and would return the pq compressed vectors.

Would these APIs be useful for Datastax and JVector?

achirkin commented 2 months ago

@cjnolet , I suspect you API suggestion sounds a lot like our existing vpq_build that produces vpq_dataset

cjnolet commented 2 months ago

@achirkin partially- what about encoding new vectors?

achirkin commented 2 months ago

For that we'd need to pry into the implementation of vpq_build, and then that is also a single call to process_and_fill_codes

cjnolet commented 2 months ago

@achirkin it sounds like it shouldn’t be too hard to expose this, we just need to agree on a more general API for quantizing vectors.

What about the APIs for computing distances / nearest neighbors on the set of query vectors? Any way we can be clever and compute this while hiding the lookup tables from the user?

achirkin commented 2 months ago

Hmm, that code atm is tightly integrated in CAGRA search, so I guess we'd need more work to decouple/expose this. https://github.com/rapidsai/cuvs/blob/branch-24.08/cpp/src/neighbors/detail/cagra/compute_distance_vpq.cuh

T-Lind commented 2 months ago

@cjnolet that sounds very useful! What kind of timeline are we looking at for that?

cjnolet commented 1 month ago

@T-Lind we'd like to expose this for our 24.10 release, which is formally released in October, but we will likely have it exposed in the nightlies before then. Does that timeline work for DataStax? Unfortunately, we're in the process of wrapping up our 24.08 release (August) and I just don't to overpromise.

T-Lind commented 1 month ago

@cjnolet we're making some changes as can be seen in this forked repo: https://github.com/jbellis/cuvs

You're doing this

During index building, IVF cluster centers are computed using k-means
These cluster centers are used to compute residuals
The residuals are then quantized using PQ in encode_vectors

What we want is

Split the vectors up into N subspaces
For each subspace, compute 256 cluster centers using k-means
Encode each vector as 1 byte per subspace
Compute the similarity between two encoded vectors as the sum of the dot products of each pair of corresponding centers

rapidsai / cuvs

[QST] Is it possible to interact with lower-level functions in CuVS? #211