microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.34k stars 252 forks source link

Added support for batch embedding generation. #531

Closed alkampfergit closed 2 weeks ago

alkampfergit commented 1 month ago

After the introduction of ITextEmbeddingBatchGenerator we need to use that interface to support batch generation of embedding in pipeline.

High level description (Approach, Design)

I'm not 100% sure of the design, but it is a think that is needed in real project in my opinion. I've introduced a EmbeddingGeneratorHelper that can handle generation of embeddings given a list of embedding generators and block of text. Then once you retrieved the embedding a simple interface allows it to retrieve from an in memory dictionary.

Actually this object can be expanded so it can persist embeddings in a durable medium. (this will be a future extension). This because in a real production project we often have documents that changes during the lifetime, and sometimes we will reindex big amount of text where the vast amount of text partitions does not check (not to take into consideration duplicate documents). Being able to persist embedding given a piece of text and an embedding generator is nice because it can greatly reduce the number of external call.

This is the reason why I externalized embedding generation to a special class.

[Edit]: I've added a setting where we can specify the maximum number of element in a single batch call (openai on azure with ada limits to 16 elements).

Added also some basic tests to verify the behavior, introduced Moq as Mock library.

alkampfergit commented 3 weeks ago

Some comments to address.

I would also avoid static classes like the new "helper", which seems like to exist to allow testing internals. For testing you can enable "internal" access, although I would avoid coupling tests with internal behavior that can be refactored without affecting public method, because tests would fail even when there's no bugs

The helper class is used because we have two class that actually create embedding from segments, to avoid code duplication I've moved code to generate embeddings (batch and non batch depending on the capabilities of registered embeddings) in that helper class.

Tests were written mainly to verify that the helper class is behaving as expected :), I know that it is an internal class, but if we need to refactor that class having a couple of test can help avoiding regressions.

If you think that those test are not necessary we can remove without any problem.

dluc commented 2 weeks ago

I rewrote the handler cleaning up the code accumulated over time, introducing a base class for the code in common between the normal embedding handler and the one using parallel processing. For later we can easily merge the two with some config option.

I added support for RequestContext (via custom_embedding_generation_batch_size_int), so the batch size can be overridden during a request, for those scenario where the config value is too high and causing errors, or too low and one might want to try boosting it.

While I'd love having unit tests, it would have taken too long to rewrite the tests given the code is completely rewritten. If you want to resubmit them please do :-)

I ran some manual tests and found an old bug that would occur when using multiple embedding generators, so I fixed that too. In my tests with Azure I could use a batch of 96 strings totaling to 80k tokens. The response is not immediate, but it works 👍

Thank you for all the work done and the patience to get this through!

alkampfergit commented 2 weeks ago

Thanks! For unit test ... there is no problem :) I wrote mainly to verify that the code I wrote was functioning correctly without the need to actually call a real service. If I need to do some other PR on that specific code I'll rewrite some test if needed.