Closed ashryanbeats closed 1 month ago
Actually, it seems like the error is happening when I load resources. Here is how I am loading the resources:
const loadResources = async (ragApplication, messages) => {
console.log("RAG Application:", ragApplication);
const loaderSummaries = await Promise.all(
messages.map(async (message) => {
console.log("Adding loader for:", message.subject);
const loaderSummary = await ragApplication.addLoader(
new JsonLoader({ object: message })
);
return loaderSummary;
})
);
console.log(
"\nLoader summaries:\n",
loaderSummaries.map((summary) => JSON.stringify(summary)).join("\n")
);
return loaderSummaries;
};
The final console log is never called, so the error must be triggered during the addLoader()
calls.
Adding a stack trace in case that's useful:
BadRequestError: 400 This model's maximum context length is 8192 tokens, however you requested 10387 tokens (10387 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
at APIError.generate (file:///Users/ash/dev/email-rag/node_modules/openai/error.mjs:41:20)
at OpenAI.makeStatusError (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:268:25)
at OpenAI.makeRequest (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:311:30)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async file:///Users/ash/dev/email-rag/node_modules/@langchain/openai/dist/embeddings.js:268:29
at async RetryOperation._fn (/Users/ash/dev/email-rag/node_modules/p-retry/index.js:50:12)
I think I'm zeroing in on the issue.
I'm not exceeding the context limit for the main model, but for the embedding model. This implied to me that the preprocessing step doesn't apply to the embedding process.
I'll keep poking.
Hey @ashryanbeats, yes - the preprocessing is not done for the embeddings. In embeding, its either all or nothing right now. The library usually breaks the loaded content sent into smaller chunks but that is not done for JSON loader.
I am thinking, we should have it auto break JSON into smaller embedding documents if the text is too large. But what chunking strategy to use needs more thought.
I have thought about this and discussed with other library maintainers for similar projects in other languages. I think the best strategy is to break the JSON at the application end outside the library. But if you have more thoughts, let's open a discussion thread on this.
Background
From the readme:
Issue
I'm not finding that this preprocessing step is happening. I've run into these context length errors on both GPT 3.5 Turbo and GPT 4o.
I understand I can do these workarounds:
setSearchResultCount()
But in order to take advantage of the described preprocessor, is there something specific I need to do?
My setup
I can confirm my setup works when I load less data. Essentially, I'm loading sanitized emails as a JSON objects. With 5 emails loaded, it's fine. With 10 emails loaded, I'm hitting the token limit.
My builder setup:
My loader:
My EmbedJS version: