Open svilupp opened 6 months ago
I am going to take care of this one.
I would happily help to move the RAG functionality into a separate package too. Let me know if you want to move forward with that.
The package LinLogQuantization.jl has a pretty neat implementation of linear quantization to unsigned types (UInt8
, UInt16
, ...). An extension to include signed types would be relative easy but also more work. What do you think about first providing support for unsigned-integer embeddings, and later on extend it to signed integers?
I was hoping to do the RAGTools migration after we merge in the Pinecone support, I don't suppose you would be interested in finishing that?
On the Int8, cool! You can probably re-use a lot from the "bitpacked" embedder. I don't mind if it's signed or not.
On the dep addition, where do you see the benefits to outweigh the costs. It's just a minor performance tweak (no big gains in any direction compared to what we have already), so I'm not sure we would need to support more than one simple implementation of this. Do you have a different view?
I would like to implement this first, since I have already invested some time on it. I can take care of the Pinecode support afterwards, it that's stopping the RAG package from being born.
LinLogQuantization.jl implements exactly what we need and nothing else. It's a very small package (less than 300 lines of code) and I cannot see a simpler implementation of linear quantization. In my opinion, anything else than using the package would be wasting effort in reinventing the wheel.
If you really want to avoid the dependency, we could take only the part of the package that implements linear quantization to avoid adding the code for logarithmic quantization.
By the way, here's a more detailed explanation of scalar quantization. It's a reference in the article you provided
Sorry for the slow response! I was at a hackathon the whole weekend.
I don't think it would be appropriate to add LLQ (with StatsBase as a dep) as a direct dependency of PromptingTools for everyone. RAG is used by only a subset of PT users, within that subset only a few users will ever look at quantization, within that picking Int8 is quite a niche (the trade-offs are quite nuanced and it's probably not worth it for most).
In addition, if we'll only ever have Int8 (I don't see any benefit from having more Int versions - there are more low-hanging fruits to get performance), it's just 2-3 functions we need, so it's a very simple problem to solve directly.
If you still insist on using the LLQ package, I'd ask you to add it as an extension (weak dep). Then I'm happy to review the PR.
EDIT: If you're super excited to drive a lot more effort in the quantization space and speed up the in-memory embeddings, we could look into shaping that as a sister package that people could just import and get a bunch of different performance optimizations!
It would be great to have support for embeddings compressed to Int8 as per HuggingFace: Embedding Quantization.
Potential implementation would be to:
<:AbstractEmbedder
forget_embeddings
) and the corresponding finder (<:AbstractSimilarityFinder
forfind_similar
)min_values
andmax_values
fields to hold the effective range for each embedding dimension (eg,length(min_values)=length(max_values)=D
)build_index
) via a utility function and then the resultingfinder
with the range to allow converting to Int8 (to be provided to theairag
)rescore_multiplier=4
(first on Int8 embeddings, then with Float x Int8)