langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.77k stars 1.97k forks source link

documents in formattedDocs function is undefined #5504

Open lawetis opened 1 month ago

lawetis commented 1 month ago

Checked other resources

Example Code

// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments)
const vectorPgConfig = {
    tableName: 'vector_store'
}
const vectorstore = await VercelPostgres.initialize(new OpenAIEmbeddings(), vectorPgConfig)
await vectorstore.addDocuments(documents)
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    "Answer the user's questions based on the below context:\n\n{context}",
  ],
  ["human", "{input}"],
]);

const combineDocsChain = await createStuffDocumentsChain({
  llm: model,
  prompt: questionAnsweringPrompt,
});

const chain = await createRetrievalChain({
  retriever: vectorstore.asRetriever(),
  combineDocsChain,
});

const response = await chain.stream({
  input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(response.answer);

Error Message and Stack Trace (if applicable)

where the error occurred:node_modules\langchain\dist\chains\combine_documents\base.cjs:12 error code:

async function formatDocuments({ documentPrompt, documentSeparator, documents, config, }) {
    console.info('formatDocuments-documents', documents) // undefined
    const formattedDocs = await Promise.all(documents.map((document) => documentPrompt
        .withConfig({ runName: "document_formatter" })
        .invoke({ ...document.metadata, page_content: document.pageContent }, config)));
    return formattedDocs.join(documentSeparator);
}

Description

I'm using Vercel Postgres as vector store and I'm getting the following error

node_modules\langchain\dist\chains\combine_documents\base.cjs:12
    const formattedDocs = await Promise.all(documents.map((document) => documentPrompt
                                                      ^
TypeError: Cannot read properties of undefined (reading 'map')
    at formatDocuments (node_modules\langchain\dist\chains\combine_documents\base.cjs:12:55)
    at RunnableLambda.func (node_modules\langchain\dist\chains\combine_documents\stuff.cjs:30:154)
    at node_modules\@langchain\core\dist\runnables\base.cjs:1432:44
    at MockAsyncLocalStorage.run (node_modules\@langchain\core\dist\singletons\index.cjs:10:16)
    at output (node_modules\@langchain\core\dist\runnables\base.cjs:1430:78)
    at new Promise (<anonymous>)
    at RunnableLambda._transform (node_modules\@langchain\core\dist\runnables\base.cjs:1429:30)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at RunnableLambda._transformStreamWithConfig (node_modules\@langchain\core\dist\runnables\base.cjs:294:30)
    at RunnableSequence._streamIterator (node_modules\@langchain\core\dist\runnables\base.cjs:1113:30)

System Info

langchain@0.2.0 | MIT | deps: 16 | versions: 271
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-0.2.0.tgz
.shasum: 555d84538962720cd7223f6c3ca4bd060978ebf3
.integrity: sha512-8c7Dg9OIPk4lFIQGyfOytXbUGLLSsxs9MV53cLODspkOGzaUpwy5FGBie30SrOxIEFJo+FDaJgpDAFO3Xi4NMw==
.unpackedSize: 3.4 MB

dependencies:
@langchain/core: ~0.2.0          binary-extensions: ^2.2.0        jsonpointer: ^5.0.1              ml-distance: ^4.0.0              uuid: ^9.0.0                     zod: ^3.22.4
@langchain/openai: ~0.0.28       js-tiktoken: ^1.0.12             langchainhub: ~0.0.8             openapi-types: ^12.1.3           yaml: ^2.2.1
@langchain/textsplitters: ~0.0.0 js-yaml: ^4.1.0                  langsmith: ~0.1.7                p-retry: 4                       zod-to-json-schema: ^3.22.3      

maintainers:
- nfcampos <nuno@boringbits.io>
- jacoblee93 <jacoblee93@gmail.com>
- andrewnguonly <andrewnguonly@gmail.com>
- davidduong <david@duong.cz>
- hwchase17 <hw.chase.17@gmail.com>
- basproul <braceasproul@gmail.com>

dist-tags:
latest: 0.2.0  next: 0.2.0    

published 3 days ago by basproul <braceasproul@gmail.com>
dosubot[bot] commented 1 month ago

To resolve the issue where the documents variable in the formattedDocs function is undefined, ensure the following:

  1. Correctly Pass the docs Parameter: Make sure the docs parameter is not undefined when calling the fromDocuments method. Here is an example:

    const docs = [
     new Document({ pageContent: 'Document 1 content', metadata: { key: 'value1' } }),
     new Document({ pageContent: 'Document 2 content', metadata: { key: 'value2' } }),
    ];
    
    const embeddings = /* initialize your embeddings instance */;
    const dbConfig = { postgresConnectionOptions: { connectionString: 'your-connection-string' } };
    
    const vercelPostgresInstance = await VercelPostgres.fromDocuments(docs, embeddings, dbConfig);
  2. Ensure documents Array is Populated: Verify that the documents array is properly populated before passing it to the formatDocuments function.

  3. Check the createStuffDocumentsChain Function: Ensure the createStuffDocumentsChain function is correctly set up to pass the documents array to the formatDocuments function.

  4. Example Usage: Here is an example of how to use the createStuffDocumentsChain function:

    const loader = new TextLoader("./state_of_the_union.txt");
    const rawDocuments = await loader.load();
    const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 0 });
    const documents = await splitter.splitDocuments(rawDocuments);
    
    const vectorPgConfig = { tableName: 'vector_store' };
    const vectorstore = await VercelPostgres.initialize(new OpenAIEmbeddings(), vectorPgConfig);
    await vectorstore.addDocuments(documents);
    
    const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
    const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
     ["system", "Answer the user's questions based on the below context:\n\n{context}"],
     ["human", "{input}"],
    ]);
    
    const combineDocsChain = await createStuffDocumentsChain({ llm: model, prompt: questionAnsweringPrompt });
    
    const chain = await createRetrievalChain({ retriever: vectorstore.asRetriever(), combineDocsChain });
    
    const response = await chain.stream({ input: "What is the president's top priority regarding prices?" });
    console.log("Chain response:", response.answer);

By following these steps, you should be able to resolve the issue of the documents variable being undefined in the formattedDocs function. Ensure that the documents array is properly populated and passed through the chain of functions.

#### Details **Sources** - [libs/langchain-community/src/vectorstores/vercel_postgres.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/vercel_postgres.ts) - [langchain/src/chains/combine_documents/base.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chains/combine_documents/base.ts) - [libs/langchain-community/src/vectorstores/tests/vercel_postgres.int.test.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/tests/vercel_postgres.int.test.ts) - [examples/src/indexes/vector_stores/vercel_postgres/example.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/vector_stores/vercel_postgres/example.ts) - [langchain/src/chains/combine_documents/stuff.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chains/combine_documents/stuff.ts) - [langchain/src/chains/combine_docs_chain.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chains/combine_docs_chain.ts) - [examples/src/indexes/recursive_text_splitter_custom_separators.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/recursive_text_splitter_custom_separators.ts) - [examples/src/indexes/recursive_text_splitter.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/recursive_text_splitter.ts) - [examples/src/indexes/markdown_text_splitter.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/markdown_text_splitter.ts) - [examples/src/indexes/html_text_splitter.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/html_text_splitter.ts) - [libs/langchain-textsplitters/src/text_splitter.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-textsplitters/src/text_splitter.ts)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

lawetis commented 1 month ago

Yes, I believe the file exists and the data is correct.But the error still exists.

d10e899f-b73d-4939-a355-228c8eefb81d

async addDocuments(documents, options) {
        console.info('addDocuments-documents-1', documents) // Data is correct
        const texts = documents.map(({ pageContent }) => pageContent);
        return this.addVectors(await this.embeddings.embedDocuments(texts), documents, options);
    }
jeasonnow commented 1 month ago

Perhaps it's because there's an issue with retrieving data using VercelPostgres. Can you try

const simpleRes = await vectorStore.similaritySearch("What is the president's top priority regarding prices?");
console.log(simpleRes);

and see what the output is? @lawetis

lawetis commented 1 month ago

I'm better now the other way around and it works fine now, thanks. @jeasonnow