llm-tools / embedJs

A NodeJS RAG framework to easily work with LLMs and embeddings
https://www.npmjs.com/package/@llm-tools/embedjs
Apache License 2.0
247 stars 32 forks source link

Maximum context length exceeded #95

Open ashryanbeats opened 1 month ago

ashryanbeats commented 1 month ago

Background

From the readme:

When the number of documents fetched leads to a request above the token limit, the library uses the following strategy -

It runs a preprocessing step to select relevant sections from each document until the total number of tokens is less than the maximum number of tokens allowed by the model. It then uses the transformed documents as context to answer the question.

Issue

I'm not finding that this preprocessing step is happening. I've run into these context length errors on both GPT 3.5 Turbo and GPT 4o.

I understand I can do these workarounds:

But in order to take advantage of the described preprocessor, is there something specific I need to do?

My setup

I can confirm my setup works when I load less data. Essentially, I'm loading sanitized emails as a JSON objects. With 5 emails loaded, it's fine. With 10 emails loaded, I'm hitting the token limit.

My builder setup:

const ragInstance = await new RAGApplicationBuilder()
    .setModel(SIMPLE_MODELS.OPENAI_GPT4_O)
    .setEmbeddingModel(new OpenAi3SmallEmbeddings())
    .setVectorDb(new HNSWDb())
    .setCache(new MemoryCache())
    .build();

My loader:

// for each `message` object...
const loaderSummary = await ragApplication.addLoader(
   new JsonLoader({ object: message })
);

My EmbedJS version:

"@llm-tools/embedjs": "^0.0.91",
ashryanbeats commented 1 month ago

Actually, it seems like the error is happening when I load resources. Here is how I am loading the resources:

const loadResources = async (ragApplication, messages) => {
  console.log("RAG Application:", ragApplication);

  const loaderSummaries = await Promise.all(
    messages.map(async (message) => {
      console.log("Adding loader for:", message.subject);

      const loaderSummary = await ragApplication.addLoader(
        new JsonLoader({ object: message })
      );

      return loaderSummary;
    })
  );

  console.log(
    "\nLoader summaries:\n",
    loaderSummaries.map((summary) => JSON.stringify(summary)).join("\n")
  );

  return loaderSummaries;
};

The final console log is never called, so the error must be triggered during the addLoader() calls.

Adding a stack trace in case that's useful:

BadRequestError: 400 This model's maximum context length is 8192 tokens, however you requested 10387 tokens (10387 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
    at APIError.generate (file:///Users/ash/dev/email-rag/node_modules/openai/error.mjs:41:20)
    at OpenAI.makeStatusError (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:268:25)
    at OpenAI.makeRequest (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:311:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///Users/ash/dev/email-rag/node_modules/@langchain/openai/dist/embeddings.js:268:29
    at async RetryOperation._fn (/Users/ash/dev/email-rag/node_modules/p-retry/index.js:50:12)
ashryanbeats commented 1 month ago

I think I'm zeroing in on the issue.

I'm not exceeding the context limit for the main model, but for the embedding model. This implied to me that the preprocessing step doesn't apply to the embedding process.

I'll keep poking.

adhityan commented 1 month ago

Hey @ashryanbeats, yes - the preprocessing is not done for the embeddings. In embeding, its either all or nothing right now. The library usually breaks the loaded content sent into smaller chunks but that is not done for JSON loader.

I am thinking, we should have it auto break JSON into smaller embedding documents if the text is too large. But what chunking strategy to use needs more thought.