Provide examples that runs 100% locally, including embeddings

microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

https://microsoft.github.io/kernel-memory

MIT License

1.62k stars 313 forks source link

Provide examples that runs 100% locally, including embeddings #363

Closed zsogitbe closed 5 months ago

zsogitbe commented 8 months ago

Context / Scenario

There should be some examples which you can just run locally without the need to first register on several services. If you state this is a 'serverless' example, then please make sure that it is completely serverless and that one can simply try it. It is a pity that non of the examples work locally!

What happened?

Endless errors:

The Azure OpenAI Deployment Name is empty (Parameter 'Deployment')
No API key
No internet
etc.

Importance

edge case

Platform, Language, Versions

Any

Relevant log output

No response

dluc commented 8 months ago

Hi @zsogitbe thanks for the feedback. The use of the word "serverless" is in the context of Kernel Memory, where the default deployment requires to stand up a "Kernel Memory web service with queues". The Serverless option allows to use Kernel Memory without a web service and without queues. The other internal dependencies are a different context, not covered by the "serverless" term.

That said, you can use KM leveraging LLama locally in combination with SimpleStorage, SimpleQueue and SimpleVectorDb. The only external dependency we recommend is OpenAI embeddings, although you can also plug in your custom local embedding generator if you like, as long as you are ok with the different quality of local embedding generators, compared to those offered by OpenAI and similar.

zsogitbe commented 8 months ago

Thank you for your answer. I think that you should add at least one example which runs without any extra web registration need. May I suggest changing the LLamaSharp example to use LLamaSharp's own embedding generator? Many people will probably not bather checking out your code further after it crashes without those API endpoints! It is a pity because this is a great project. I understand that you try to force the usage of Azure and OpenAI, but believe me, one working example without those API endpoints will help you!

dluc commented 8 months ago

There's an example here: https://github.com/microsoft/kernel-memory/blob/main/examples/105-dotnet-serverless-llamasharp/Program.cs

For embeddings though we don't recommend using LLama because it doesn't provide sufficient quality for RAG, leading to no results or incorrect ones.

zsogitbe commented 8 months ago

I do not have an Azure account... the code will probably crash...

I think that you underestimate Lama models. Lama models are one of the best models. Let us test your assumption. Give me the output of this example and I will try to use a Lama model to see what result it gives.

zsogitbe commented 8 months ago

I have created an example with the default LLamaSharp text embedding and this is the output of your example:

On the International Space Station (ISS), an unusual incident occurred involving a hydroponically grown tomato, which defied the previous belief that soil was necessary for its growth. This pioneering tomato plant held significance as the first of its kind to thrive in extraterrestrial conditions. However, American astronaut Frank Rubio accidentally misplaced this remarkable fruit during his time on the ISS.

The only issue I have encountered is that KernelMemory does not tolerate a question mark at the end of the question. I think that this is intentional for some reason and this will probably be the reason for why you thought that lama models do not work well.

dluc commented 8 months ago

The problem is actually LLama embeddings ability to capture the meaning of text, which is very low particularly with mixed content. When it comes to using cosine similarity to find similar content, it would cause a lot of irrelevant text to be included in the RAG algorithm. If you want to experiment with local models I recommend looking at MTEB here https://huggingface.co/spaces/mteb/leaderboard