Open sarfudheen opened 4 months ago
🤖
Hey @sarfudheen, great to see you back! Hope all's been well on your coding adventures. 🚀
Based on the error message you're receiving, it seems like you're trying to access a property of an object that hasn't been defined yet. This might be due to the asynchronous nature of JavaScript and the way the collapse
function is implemented in your code.
In the collapse
function, you're using a while loop to continuously collapse documents until the total number of tokens is within the limit. However, the collapseDocs
function inside the while loop is asynchronous, meaning it returns a Promise. The splitDocs.map
function will return an array of Promises, and you're using Promise.all
to wait for all these Promises to resolve. However, if the collapseDocs
function doesn't return a value for some reason (for example, if an error occurs during its execution), the corresponding element in the docs
array will be undefined
. This could lead to the "Cannot read properties of undefined" error in subsequent iterations of the while loop or in other parts of your code where you're trying to access properties of the documents.
To fix this issue, you should add error handling to the collapseDocs
function and make sure it always returns a value. Here's an example of how you can do this:
const collapseDocs = async (doc, invoke) => {
try {
return await invoke(doc);
} catch (error) {
console.error(`Failed to collapse document: ${error}`);
return doc; // Return the original document if collapsing fails
}
};
// ...
docs = await Promise.all(
splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke))
);
This way, even if an error occurs during the execution of collapseDocs
, it will still return a value (the original document), preventing the "Cannot read properties of undefined" error.
As for the collapseDocs
function, it is used to collapse a list of documents into a single document. It takes two parameters: a list of documents (docs
) and a function (combineDocumentFunc
) that combines the content of a list of documents into a single string. The function works by first calling the combineDocumentFunc
with the list of documents as its argument. The combineDocumentFunc
is expected to return a promise that resolves to a combined string of all the documents' content. This combined string is then used as the pageContent
of the resulting document. In addition to combining the content of the documents, collapseDocs
also merges the metadata of all the documents using the collapseDocsMetadata
function. This function iterates over the metadata of each document and combines them into a single metadata object. If a metadata key is present in multiple documents, the values are concatenated with a comma separator. The collapseDocs
function then returns a promise that resolves to a single document with the combined content and merged metadata.
I hope this helps! If you have any other questions, feel free to ask.
After updating code as per the your sugession
const collapseDocs = async (doc, invoke) => {
try {
return await invoke(doc);
} catch (error) {
console.error(`Failed to collapse document: ${error}`);
return doc; // Return the original document if collapsing fails
}
};
I am getting,
[0] Failed to collapse document: TypeError: Cannot read properties of undefined (reading 'first') [1] Failed to collapse document: TypeError: Cannot read properties of undefined (reading 'first') [0] I'm sorry, but you haven't provided any summaries to combine. Could you please provide the summaries you would like me to combine?
any solution ? i still having this issue ...
@JuGit-pk I'm not sure about the problem. But I think it is related to scope. You can try something like below. Binds the created RunnableSequence class to the method.
docs = await Promise.all(
splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain)))
)
i think its better to use k means for the summarization for larger documents than using this chain, which so time taking and costly...
You are right k mean is a more efficient solution for summarization but map reduce is not only used for summarization.
@Bariskau , exactly. I really need this fix because I've been working on three features that depend on this map-reduce chain. One of them has been fixed using k-means, but the others still rely on it. You're right about that.
@JuGit-pk I tested the code I gave above. All you need to do is bind the runnable chain instance to the invoke method. It solves your problem for now. I can open a pr for that.
@Bariskau the problem we were facing was, scope related, that is fixed, great ...
docs = await Promise.all(
splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain)))
)
now for the larger documents i see the error with the tokens,
BadRequestError: 400 This model's maximum context length is 16385 tokens. However, your messages resulted in 24052 tokens. Please reduce the length of the messages.
export const createFlashcards = async (chat: Chat) => {
const qdrantClient = getQDrantClient();
const model = new ChatOpenAI({
modelName: "gpt-3.5-turbo",
// temperature: 0.1,
// maxTokens: 512,
openAIApiKey: OPENAI_API_KEY,
// streaming: true,
});
// Define prompt templates for document formatting, summarizing, collapsing, and combining
const documentPrompt = PromptTemplate.fromTemplate("{pageContent}");
const summarizePrompt = PromptTemplate.fromTemplate(
"Imagine you're crafting flashcards to help students memorize key information effectively. Summarize the main points or interesting facts from this context:\n\n{context}"
);
const collapsePrompt = PromptTemplate.fromTemplate(
"Now, condense the summarized content into a concise format suitable for flashcards:\n\n{context}"
);
const combinePrompt = PromptTemplate.fromTemplate(
"You're preparing a comprehensive set of flashcards for student assessment. Integrate these condensed flashcards into an effective quiz format:\n\n{context} \n{format_instructions}"
);
// Wrap the `formatDocument` util so it can format a list of documents
const formatDocs = async (documents: Document[]): Promise<string> => {
const formattedDocs = await Promise.all(
documents.map((doc) => formatDocument(doc, documentPrompt))
);
return formattedDocs.join("\n\n");
};
// Define a function to get the number of tokens in a list of documents
const getNumTokens = async (documents: Document[]): Promise<number> =>
model.getNumTokens(await formatDocs(documents));
// Initialize the output parser
const outputParser = new StringOutputParser();
// parser with schema
const summmaryOutputSchema = StructuredOutputParser.fromZodSchema(
z.object({
flashcards: z.array(
z.object({
question: z
.string()
.describe("Question or knowledge of the flashcard"),
answer: z
.string()
.describe("Answer or value to the question or knowledge"),
})
),
})
);
// Define the map chain to format, summarize, and parse the document
const mapChain = RunnableSequence.from([
{ context: async (i: Document) => formatDocument(i, documentPrompt) },
summarizePrompt,
model,
outputParser,
]);
// Define the collapse chain to format, collapse, and parse a list of documents
const collapseChain = RunnableSequence.from([
{ context: async (documents: Document[]) => formatDocs(documents) },
collapsePrompt,
model,
outputParser,
]);
// Define a function to collapse a list of documents until the total number of tokens is within the limit
const collapse = async (
documents: Document[],
options?: {
config?: BaseCallbackConfig;
},
tokenMax = 1500
) => {
const editableConfig = options?.config;
let docs = documents;
let collapseCount = 1;
while ((await getNumTokens(docs)) > tokenMax) {
console.log("Collapsing documents...", {
collapseCount,
numTokens: await getNumTokens(docs),
condition: (await getNumTokens(docs)) > tokenMax,
});
if (editableConfig) {
editableConfig.runName = `Collapse ${collapseCount}`;
}
const splitDocs = splitListOfDocs(docs, getNumTokens, tokenMax);
// docs = await Promise.all(
// splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke))
// );
docs = await Promise.all(
splitDocs.map((doc) =>
collapseDocs(doc, collapseChain.invoke.bind(collapseChain))
)
);
collapseCount += 1;
}
return docs;
};
// Define the reduce chain to format, combine, and parse a list of documents
const reduceChain = RunnableSequence.from([
{
context: formatDocs,
format_instructions: () => summmaryOutputSchema.getFormatInstructions(),
},
combinePrompt,
model,
summmaryOutputSchema,
]).withConfig({ runName: "Reduce" });
// Define the final map-reduce chain
const mapReduceChain = RunnableSequence.from([
RunnableSequence.from([
{
doc: new RunnablePassthrough(),
content: mapChain,
},
(input) =>
new Document({
pageContent: input.content,
metadata: input.doc.metadata,
}),
])
.withConfig({ runName: "Summarize (return doc)" })
.map(),
collapse,
reduceChain,
]).withConfig({ runName: "Map reduce" });
// spliting the doc
const blob = await downloadPdf(chat.pdfStoragePath);
if (!blob) {
throw new Error(
"Failed to get the blob from the loadPdfIntoVectorStore function"
);
}
const loader = new WebPDFLoader(blob);
const doc = await loader.load();
// Split
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 100,
});
const chunks = await splitter.splitDocuments(doc);
const result = await mapReduceChain.invoke(chunks);
return result;
};
@JuGit-pk I'm not sure about the problem. But I think it is related to scope. You can try something like below. Binds the created RunnableSequence class to the method.
docs = await Promise.all( splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain))) )
I confirm the example is working fine after updating
docs = await Promise.all( splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain))) )
@JuGit-pk Thank you very much for your fix.
You can create a PR to update example code in the link https://js.langchain.com/v0.1/docs/modules/chains/document/map_reduce/#with-lcel
Issue: Error in executing map_reduce#with-lcel sample program with large document
Description:
I have created the following sample program to test from the link:
While I'm trying to execute it, I am getting
[0] file:///D://node_samples/node_modules/@langchain/core/dist/runnables/base.js:1013 [0] const initialSteps = [this.first, ...this.middle]; [0] ^ [0] TypeError: Cannot read properties of undefined (reading 'first')
Additional Notes:
It is working fine for small documents(test_small.txt), but it throws an exception for big document(test_big.txt)
Enviroment
OS: Windows 10 Node : v18.16.0 langchain: ^0.1.21
Attachements
test_small.txt error_trace.txt test_big.txt