llm-tools / embedJs

A NodeJS RAG framework to easily work with LLMs and embeddings
https://llm-tools.mintlify.app/get-started/introduction
Apache License 2.0
335 stars 40 forks source link

Don't understand if embedder is working or not #105

Closed hammeronthenet closed 1 month ago

hammeronthenet commented 3 months ago

I was trying this simple example:

import { AdaEmbeddings, Anthropic, CohereEmbeddings, HuggingFace, PdfLoader, RAGApplicationBuilder, SitemapLoader, WebLoader, YoutubeLoader } from '@llm-tools/embedjs';
import { HNSWDb } from '@llm-tools/embedjs/vectorDb/hnswlib';

import 'dotenv/config';
console.log(`Database name is ${process.env.OPENAI_API_KEY}`);
const llmApplication = await new RAGApplicationBuilder()
// .setModel('NO_MODEL')
.setModel(new HuggingFace({ modelName: 'mistralai/Mixtral-8x7B-Instruct-v0.1' }))
.setEmbeddingModel(new CohereEmbeddings())
.setVectorDb(new HNSWDb())
.build();

await llmApplication.addLoader(new PdfLoader({ filePathOrUrl: 'https://bohacales.wordpress.com/wp-content/uploads/2013/06/the-encyclopaedia-metallica.pdf' }));

var context = await llmApplication.getContext("What's Metallica's second album?");

for (var cnt of context) {
    console.log(cnt.pageContent);
}

console.log((await llmApplication.query("What's Metallica's second album?")).result);

The LLM response is correct, but I think that that response is derived by previous LLM model instruction and not from embedding.

I think so because I print all context data retrieved by cohere embedding and there is no information about the name of the album.

What am I doing wrong? Is it a bug or it's me?

adhityan commented 1 month ago

You can use the debug logs (setting DEBUG=embedjs:* to the environment) to see what is the context picked up, filtered, etc. You can also verify this from the sources array in the query response.

Sorry about the delayed response. The library has been extensively refactored and docs updated. Could you reopen the issue if it is an issue in the 0.1.x series?