langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.92k stars 2k forks source link

TypeError: Cannot read properties of undefined (reading 'first') in map_reduce#with-lcel #4514

Open sarfudheen opened 4 months ago

sarfudheen commented 4 months ago

Issue: Error in executing map_reduce#with-lcel sample program with large document

Description:

I have created the following sample program to test from the link:

import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import {
  collapseDocs,
  splitListOfDocs,
} from "langchain/chains/combine_documents/reduce";
import { formatDocument } from "langchain/schema/prompt_template";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { BaseCallbackConfig } from "@langchain/core/callbacks/manager";
import { Document } from "@langchain/core/documents";
import { PromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ModelService } from "./model.service.js";
import { LLMModel, LLMProvider } from "./referentiel.js";
import { TextLoader } from "langchain/document_loaders/fs/text";

const main = async () => {
  const loader = new TextLoader(
    "C:/test_data/test_big.txt"
  );

  const documents = await loader.loadAndSplit(
    new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    })
  );
  console.log("number of documents:", documents.length);
  const model = new ChatOpenAI({});

  // Define prompt templates for document formatting, summarizing, collapsing, and combining
  const documentPrompt = PromptTemplate.fromTemplate("{pageContent}");
  const summarizePrompt = PromptTemplate.fromTemplate(
    "Summarize this content:\n\n{context}"
  );
  const collapsePrompt = PromptTemplate.fromTemplate(
    "Collapse this content:\n\n{context}"
  );
  const combinePrompt = PromptTemplate.fromTemplate(
    "Combine these summaries:\n\n{context}"
  );

  // Wrap the `formatDocument` util so it can format a list of documents
  const formatDocs = async (documents: Document[]): Promise<string> => {
    const formattedDocs = await Promise.all(
      documents.map((doc) => formatDocument(doc, documentPrompt))
    );
    return formattedDocs.join("\n\n");
  };

  // Define a function to get the number of tokens in a list of documents
  const getNumTokens = async (documents: Document[]): Promise<number> =>
    model.getNumTokens(await formatDocs(documents));

  // Initialize the output parser
  const outputParser = new StringOutputParser();

  // Define the map chain to format, summarize, and parse the document
  const mapChain = RunnableSequence.from([
    { context: async (i: Document) => formatDocument(i, documentPrompt) },
    summarizePrompt,
    model,
    outputParser,
  ]);

  // Define the collapse chain to format, collapse, and parse a list of documents
  const collapseChain = RunnableSequence.from([
    { context: async (documents: Document[]) => formatDocs(documents) },
    collapsePrompt,
    model,
    outputParser,
  ]);

  // Define a function to collapse a list of documents until the total number of tokens is within the limit
  const collapse = async (
    documents: Document[],
    options?: {
      config?: BaseCallbackConfig;
    },
    tokenMax = 4000
  ) => {
    const editableConfig = options?.config;
    let docs = documents;
    let collapseCount = 1;
    while ((await getNumTokens(docs)) > tokenMax) {
      if (editableConfig) {
        editableConfig.runName = `Collapse ${collapseCount}`;
      }
      const splitDocs = splitListOfDocs(docs, getNumTokens, tokenMax);
      docs = await Promise.all(
        splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke))
      );
      collapseCount += 1;
    }
    return docs;
  };

  // Define the reduce chain to format, combine, and parse a list of documents
  const reduceChain = RunnableSequence.from([
    { context: formatDocs },
    combinePrompt,
    model,
    outputParser,
  ]).withConfig({ runName: "Reduce" });

  // Define the final map-reduce chain
  const mapReduceChain = RunnableSequence.from([
    RunnableSequence.from([
      { doc: new RunnablePassthrough(), content: mapChain },
      (input) =>
        new Document({
          pageContent: input.content,
          metadata: input.doc.metadata,
        }),
    ])
      .withConfig({ runName: "Summarize (return doc)" })
      .map(),
    collapse,
    reduceChain,
  ]).withConfig({ runName: "Map reduce" });
  const result = await mapReduceChain.invoke(documents);
  console.log(result);
};
await main();

While I'm trying to execute it, I am getting [0] file:///D://node_samples/node_modules/@langchain/core/dist/runnables/base.js:1013 [0] const initialSteps = [this.first, ...this.middle]; [0] ^ [0] TypeError: Cannot read properties of undefined (reading 'first')

Additional Notes:

It is working fine for small documents(test_small.txt), but it throws an exception for big document(test_big.txt)

Enviroment

OS: Windows 10 Node : v18.16.0 langchain: ^0.1.21

Attachements

test_small.txt error_trace.txt test_big.txt

dosubot[bot] commented 4 months ago

🤖

Hey @sarfudheen, great to see you back! Hope all's been well on your coding adventures. 🚀

Based on the error message you're receiving, it seems like you're trying to access a property of an object that hasn't been defined yet. This might be due to the asynchronous nature of JavaScript and the way the collapse function is implemented in your code.

In the collapse function, you're using a while loop to continuously collapse documents until the total number of tokens is within the limit. However, the collapseDocs function inside the while loop is asynchronous, meaning it returns a Promise. The splitDocs.map function will return an array of Promises, and you're using Promise.all to wait for all these Promises to resolve. However, if the collapseDocs function doesn't return a value for some reason (for example, if an error occurs during its execution), the corresponding element in the docs array will be undefined. This could lead to the "Cannot read properties of undefined" error in subsequent iterations of the while loop or in other parts of your code where you're trying to access properties of the documents.

To fix this issue, you should add error handling to the collapseDocs function and make sure it always returns a value. Here's an example of how you can do this:

const collapseDocs = async (doc, invoke) => {
  try {
    return await invoke(doc);
  } catch (error) {
    console.error(`Failed to collapse document: ${error}`);
    return doc; // Return the original document if collapsing fails
  }
};

// ...

docs = await Promise.all(
  splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke))
);

This way, even if an error occurs during the execution of collapseDocs, it will still return a value (the original document), preventing the "Cannot read properties of undefined" error.

As for the collapseDocs function, it is used to collapse a list of documents into a single document. It takes two parameters: a list of documents (docs) and a function (combineDocumentFunc) that combines the content of a list of documents into a single string. The function works by first calling the combineDocumentFunc with the list of documents as its argument. The combineDocumentFunc is expected to return a promise that resolves to a combined string of all the documents' content. This combined string is then used as the pageContent of the resulting document. In addition to combining the content of the documents, collapseDocs also merges the metadata of all the documents using the collapseDocsMetadata function. This function iterates over the metadata of each document and combines them into a single metadata object. If a metadata key is present in multiple documents, the values are concatenated with a comma separator. The collapseDocs function then returns a promise that resolves to a single document with the combined content and merged metadata.

I hope this helps! If you have any other questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
sarfudheen commented 4 months ago

After updating code as per the your sugession

const collapseDocs = async (doc, invoke) => {
  try {
    return await invoke(doc);
  } catch (error) {
    console.error(`Failed to collapse document: ${error}`);
    return doc; // Return the original document if collapsing fails
  }
};

I am getting,

[0] Failed to collapse document: TypeError: Cannot read properties of undefined (reading 'first') [1] Failed to collapse document: TypeError: Cannot read properties of undefined (reading 'first') [0] I'm sorry, but you haven't provided any summaries to combine. Could you please provide the summaries you would like me to combine?

JuGit-pk commented 1 month ago

any solution ? i still having this issue ...

Bariskau commented 1 month ago

@JuGit-pk I'm not sure about the problem. But I think it is related to scope. You can try something like below. Binds the created RunnableSequence class to the method.

   docs = await Promise.all(
        splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain)))
      )
JuGit-pk commented 1 month ago

i think its better to use k means for the summarization for larger documents than using this chain, which so time taking and costly...

Bariskau commented 1 month ago

You are right k mean is a more efficient solution for summarization but map reduce is not only used for summarization.

JuGit-pk commented 1 month ago

@Bariskau , exactly. I really need this fix because I've been working on three features that depend on this map-reduce chain. One of them has been fixed using k-means, but the others still rely on it. You're right about that.

Bariskau commented 1 month ago

@JuGit-pk I tested the code I gave above. All you need to do is bind the runnable chain instance to the invoke method. It solves your problem for now. I can open a pr for that.

JuGit-pk commented 1 month ago

@Bariskau the problem we were facing was, scope related, that is fixed, great ...

docs = await Promise.all(
        splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain)))
      )

now for the larger documents i see the error with the tokens,

BadRequestError: 400 This model's maximum context length is 16385 tokens. However, your messages resulted in 24052 tokens. Please reduce the length of the messages.

export const createFlashcards = async (chat: Chat) => {
  const qdrantClient = getQDrantClient();

  const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo",
    // temperature: 0.1,
    // maxTokens: 512,
    openAIApiKey: OPENAI_API_KEY,
    // streaming: true,
  });

  // Define prompt templates for document formatting, summarizing, collapsing, and combining
  const documentPrompt = PromptTemplate.fromTemplate("{pageContent}");
  const summarizePrompt = PromptTemplate.fromTemplate(
    "Imagine you're crafting flashcards to help students memorize key information effectively. Summarize the main points or interesting facts from this context:\n\n{context}"
  );

  const collapsePrompt = PromptTemplate.fromTemplate(
    "Now, condense the summarized content into a concise format suitable for flashcards:\n\n{context}"
  );

  const combinePrompt = PromptTemplate.fromTemplate(
    "You're preparing a comprehensive set of flashcards for student assessment. Integrate these condensed flashcards into an effective quiz format:\n\n{context} \n{format_instructions}"
  );

  // Wrap the `formatDocument` util so it can format a list of documents
  const formatDocs = async (documents: Document[]): Promise<string> => {
    const formattedDocs = await Promise.all(
      documents.map((doc) => formatDocument(doc, documentPrompt))
    );
    return formattedDocs.join("\n\n");
  };

  // Define a function to get the number of tokens in a list of documents
  const getNumTokens = async (documents: Document[]): Promise<number> =>
    model.getNumTokens(await formatDocs(documents));

  // Initialize the output parser
  const outputParser = new StringOutputParser();

  // parser with schema
  const summmaryOutputSchema = StructuredOutputParser.fromZodSchema(
    z.object({
      flashcards: z.array(
        z.object({
          question: z
            .string()
            .describe("Question or knowledge of the flashcard"),
          answer: z
            .string()
            .describe("Answer or value to the question or knowledge"),
        })
      ),
    })
  );

  // Define the map chain to format, summarize, and parse the document
  const mapChain = RunnableSequence.from([
    { context: async (i: Document) => formatDocument(i, documentPrompt) },
    summarizePrompt,
    model,
    outputParser,
  ]);

  // Define the collapse chain to format, collapse, and parse a list of documents
  const collapseChain = RunnableSequence.from([
    { context: async (documents: Document[]) => formatDocs(documents) },
    collapsePrompt,
    model,
    outputParser,
  ]);

  // Define a function to collapse a list of documents until the total number of tokens is within the limit
  const collapse = async (
    documents: Document[],
    options?: {
      config?: BaseCallbackConfig;
    },
    tokenMax = 1500
  ) => {
    const editableConfig = options?.config;
    let docs = documents;
    let collapseCount = 1;
    while ((await getNumTokens(docs)) > tokenMax) {
      console.log("Collapsing documents...", {
        collapseCount,
        numTokens: await getNumTokens(docs),
        condition: (await getNumTokens(docs)) > tokenMax,
      });
      if (editableConfig) {
        editableConfig.runName = `Collapse ${collapseCount}`;
      }
      const splitDocs = splitListOfDocs(docs, getNumTokens, tokenMax);
      // docs = await Promise.all(
      //   splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke))
      // );
      docs = await Promise.all(
        splitDocs.map((doc) =>
          collapseDocs(doc, collapseChain.invoke.bind(collapseChain))
        )
      );
      collapseCount += 1;
    }
    return docs;
  };

  // Define the reduce chain to format, combine, and parse a list of documents
  const reduceChain = RunnableSequence.from([
    {
      context: formatDocs,
      format_instructions: () => summmaryOutputSchema.getFormatInstructions(),
    },
    combinePrompt,
    model,
    summmaryOutputSchema,
  ]).withConfig({ runName: "Reduce" });

  // Define the final map-reduce chain
  const mapReduceChain = RunnableSequence.from([
    RunnableSequence.from([
      {
        doc: new RunnablePassthrough(),
        content: mapChain,
      },
      (input) =>
        new Document({
          pageContent: input.content,
          metadata: input.doc.metadata,
        }),
    ])
      .withConfig({ runName: "Summarize (return doc)" })
      .map(),
    collapse,
    reduceChain,
  ]).withConfig({ runName: "Map reduce" });

  // spliting the doc
  const blob = await downloadPdf(chat.pdfStoragePath);
  if (!blob) {
    throw new Error(
      "Failed to get the blob from the loadPdfIntoVectorStore function"
    );
  }
  const loader = new WebPDFLoader(blob);
  const doc = await loader.load();

  // Split
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 100,
  });
  const chunks = await splitter.splitDocuments(doc);

  const result = await mapReduceChain.invoke(chunks);

  return result;
};
sarfudheen commented 1 month ago

@JuGit-pk I'm not sure about the problem. But I think it is related to scope. You can try something like below. Binds the created RunnableSequence class to the method.

   docs = await Promise.all(
        splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain)))
      )

I confirm the example is working fine after updating docs = await Promise.all( splitDocs.map((doc) => collapseDocs(doc, collapseChain.invoke.bind(collapseChain))) )

@JuGit-pk Thank you very much for your fix.

You can create a PR to update example code in the link https://js.langchain.com/v0.1/docs/modules/chains/document/map_reduce/#with-lcel