Closed flatsiedatsie closed 6 months ago
A classic RAG system consist of a vector database + a generative model. With wllama, this can be archived by:
Another idea that is only possible if your document is short and predefined, is to construct a session and reuse it later (via sessionSave
and sessionLoad
) - This is useful in my case for example, if the chatbot is purely to introduce a specific website, we don't even need to make a vector database or to have embeddings at all. The downside is that this is not practical for any other usages.
For a small embedding model good for this case, I can recommend this one: sentence-transformers/multi-qa-MiniLM-L6-cos-v1 (GGUF)
Getting there...
Currently using Transformers.js because I could find easy to copy examples:
extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
quantized: false,
progress_callback: data => {
self.postMessage({
type: 'embedding_progress',
data
});
}
});
embeddings = await extractor(texts, { pooling: 'mean', normalize: true });
I've alse seen mention of this model for embedding: nomic-ai/nomic-embed-text-v1
? But for now.. it works.
Next: get an LLM to summarize the chunks.
Ah nice. I tried nomic-embed-text before but it doesn't work very well. But maybe because I used Albert Einstein wiki page as the example, which is a very hard one.
Maybe you can give it a try?
Some questions that I tried but no success:
Some questions that I tried but no success: Does he play guitar?
Did you let the LLM re-formulate the prompt first? In my project I just added the step to do that by looking at the conversation history first and rewriting the user's prompt to be explicit. So "he" becomes "Albert Einstein. It seems to work.
In fact it's all now working. Although the answer in this case seems almost too good to be solely based on the retrieved chunks..
In your readme you mention:
I've been looking into a way to allow users to 'chat with their documents'. A popular concept. Specifically I was looking into 'Fully local PDF chatbot'. It seems.. complicated.
So I was wondering: if one wanted to implement this feature using Wllama, what are the 'components' of such a solution?
Would it be something like...
What would the steps actually be?