langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.79k stars 1.97k forks source link

ParentDocumentRetriever fails with LocalFileStore #4965

Open mrcrdwd opened 3 months ago

mrcrdwd commented 3 months ago

Checked other resources

Example Code

const { OpenAIEmbeddings } = require("@langchain/openai");
const { MemoryVectorStore } = require("langchain/vectorstores/memory");
const { LocalFileStore } = require("langchain/storage/file_system");
const { ParentDocumentRetriever } = require("langchain/retrievers/parent_document");
const { RecursiveCharacterTextSplitter } = require("langchain/text_splitter");
const { Document } = require("@langchain/core/documents");

const vectorstore = new MemoryVectorStore(new OpenAIEmbeddings());
const docstore = await LocalFileStore.fromPath("/home/someuser/n8n/docstore");

const retriever = new ParentDocumentRetriever({
  vectorstore,
  docstore,
  // Optional, not required if you're already passing in split documents
  parentSplitter: new RecursiveCharacterTextSplitter({
    chunkOverlap: 0,
    chunkSize: 500,
  }),
  childSplitter: new RecursiveCharacterTextSplitter({
    chunkOverlap: 0,
    chunkSize: 50,
  }),
  // Optional `k` parameter to search for more child documents in VectorStore.
  // Note that this does not exactly correspond to the number of final (parent) documents
  // retrieved, as multiple child documents can point to the same parent.
  childK: 20,
  // Optional `k` parameter to limit number of final, parent documents returned from this
  // retriever and sent to LLM. This is an upper-bound, and the final count may be lower than this.
  parentK: 5,
});

const parentDocuments = [new Document({ pageContent: "The first document."}), new Document({ pageContent: "Another document for testing."})]

// We must add the parent documents via the retriever's addDocuments method
await retriever.addDocuments(parentDocuments);

Error Message and Stack Trace (if applicable)

"stackTrace": [
  "Error: Error writing file at path: /home/someuser/n8n/docstore/c53d8262-c418-45db-bdb8-7aea77a6af5e.txt.",
  "Error: {\"code\":\"ERR_INVALID_ARG_TYPE\"}",
  "    at LocalFileStore.setFileContent (/home/someuser/.nvm/versions/node/v20.5.1/lib/node_modules/n8n/node_modules/@n8n/n8n-nodes-langchain/node_modules/langchain/dist/storage/file_system.cjs:104:19)",
  "    at processTicksAndRejections (node:internal/process/task_queues:95:5)",
  "    at async Promise.all (index 0)",
  "    at LocalFileStore.mset (/home/someuser/.nvm/versions/node/v20.5.1/lib/node_modules/n8n/node_modules/@n8n/n8n-nodes-langchain/node_modules/langchain/dist/storage/file_system.cjs:140:9)",
  "    at ParentDocumentRetriever.addDocuments (/home/someuser/.nvm/versions/node/v20.5.1/lib/node_modules/n8n/node_modules/@n8n/n8n-nodes-langchain/node_modules/langchain/dist/retrievers/parent_document.cjs:195:13)"
]

Description

I'm trying to use LocalFileStore as a replacement for InMemoryStore.

The example code above (running in an N8N LangChain Code node) throws an ERR_INVALID_ARG_TYPE error.

Inspecting file_system.ts it looks like setFileContent() expects an Uint8Array. I couldn't tell if parent_document.ts is passing the right type.

Is this a user error from my end?

System Info

├── @langchain/community@0.0.44
├── @langchain/core@0.1.52
├── @langchain/openai@0.0.25
├── @langchain/pinecone@0.0.4
├── @pinecone-database/pinecone@2.2.0
├── corepack@0.19.0
├── n8n@1.33.1
├── npm@10.5.0
└── pm2@5.3.0
dosubot[bot] commented 3 months ago

Hey @mrcrdwd! Great to see you diving into the depths of LangChain.js again. Hope you've been doing well since our last chat! 🚀

Based on the error message and stack trace you provided, it seems like the issue is related to the setFileContent() method in the LocalFileStore class. This method expects a Uint8Array as the content to be written to the file. However, the ParentDocumentRetriever might be passing a different type, which is causing the ERR_INVALID_ARG_TYPE error.

In your implementation, you're using LocalFileStore as the docstore for ParentDocumentRetriever. When you call addDocuments(), ParentDocumentRetriever internally calls the mset() method of LocalFileStore, which in turn calls setFileContent() for each document. The content passed to setFileContent() should be a Uint8Array.

Here's the relevant part of the mset() method in LocalFileStore:

async mset(pairs: [string, Uint8Array][]): Promise<void> {
  await Promise.all(
    pairs.map(([key, content]) => this.setFileContent(content, key))
  );
}

And here's the setFileContent() method:

private async setFileContent(content: Uint8Array, key: string): Promise<void> {
  const filePath = this.getFilePathForKey(key);
  await fs.writeFile(filePath, content);
}

As you can see, setFileContent() expects content to be a Uint8Array. If ParentDocumentRetriever is passing a different type, that would cause the ERR_INVALID_ARG_TYPE error.

To fix this issue, you need to ensure that the documents you're passing to addDocuments() are correctly converted to Uint8Array before they're stored. The Document class should handle this conversion, but you might need to check if there's an issue with the way you're creating your Document instances.

Here's how you're creating your Document instances:

const parentDocuments = [new Document({ pageContent: "The first document."}), new Document({ pageContent: "Another document for testing."})]

Ensure that the pageContent property is correctly converted to Uint8Array in the Document class. If the issue persists, you might need to debug the Document class and the addDocuments() method in ParentDocumentRetriever to see where the type mismatch is happening.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources:

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
mrcrdwd commented 3 months ago

There is a clear difference between how mset() is implemented in LocalFileStore compared to InMemoryStore. InMemoryStore takes T as a parameter while LocalFileStore requires an Uint8Array.

And addDocuments() in ParentDocumentRetriever passes a Record<string, Document> array tomset().

LocalFileStore

  async mset(keyValuePairs: [string, Uint8Array][]): Promise<void> {
    await Promise.all(
      keyValuePairs.map(([key, value]) => this.setFileContent(value, key))
    );
  }

InMemoryStore

  async mset(keyValuePairs: [string, T][]): Promise<void> {
    for (const [key, value] of keyValuePairs) {
      this.store[key] = value;
    }
  }