mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.73k stars 3k forks source link

Unable to connect with PineconeStore -> TypeError: Cannot read properties of undefined (reading 'text') #467

Closed sandeep2569 closed 1 month ago

sandeep2569 commented 1 month ago

Tried all the solution but still facing this issue please help here is my Code ⬇️

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'; import { OpenAIEmbeddings } from "@langchain/openai"; import { PineconeStore } from "@langchain/pinecone"; import { pinecone } from '@/utils/pinecone-client'; import { PDFLoader } from 'langchain/document_loaders/fs/pdf'; import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone'; import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';

/ Name of directory to retrieve your files from Make sure to add your PDF files inside the 'docs' folder / const filePath = 'docs';

export const run = async () => { try { /load raw docs from the all files in the directory / const directoryLoader = new DirectoryLoader(filePath, { '.pdf': (path) => new PDFLoader(path), });

// const loader = new PDFLoader(filePath);
const rawDocs = await directoryLoader.load();

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await textSplitter.splitDocuments(rawDocs);
console.log('split docs', docs);

console.log('creating vector store...');
/*create and store the embeddings in the vectorStore*/
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); //change to your own index name

//embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  namespace: PINECONE_NAME_SPACE,
  textKey: 'text',
});

} catch (error) { throw new Error('Failed to ingest your data'); } };

(async () => { await run(); console.log('ingestion complete'); })();

dosubot[bot] commented 1 month ago

The error you're encountering suggests there's an issue with the structure of the docs object passed to PineconeStore.fromDocuments, specifically that elements within docs are missing the expected text property. To resolve this, you should first verify the structure of docs after splitting the documents. Ensure each element has a text property by inspecting the output of textSplitter.splitDocuments(rawDocs).

If the structure doesn't match expectations, you may need to adjust docs before passing it to PineconeStore.fromDocuments. For example, if docs is supposed to be an array of objects with a text property but isn't formatted correctly, you'll need to transform docs to ensure each object contains a text key with the appropriate value.

Here's a quick way to debug and potentially adjust the docs structure:

  1. Log the structure of docs to confirm each element contains a text property:

    console.log('split docs', JSON.stringify(docs, null, 2));
  2. If the structure is incorrect, adjust docs accordingly:

    const adjustedDocs = docs.map(doc => ({
     text: doc.someOtherProperty // Replace `someOtherProperty` with the actual property name
    }));

Then, use adjustedDocs in place of docs when calling PineconeStore.fromDocuments.

This approach should help you resolve the "TypeError: Cannot read properties of undefined (reading 'text')" error by ensuring the data passed to PineconeStore.fromDocuments is correctly structured.

#### Details **Sources** - [ErrorWithoutStackTrace: PineconeClient: Error calling upsert: ErrorWithoutStackTrace: undefined](https://github.com/mayooear/gpt4-pdf-chatbot-langchain/issues/204) - [Ingest Error (previously working)](https://github.com/mayooear/gpt4-pdf-chatbot-langchain/issues/157) - [stream.getReader](https://github.com/mayooear/gpt4-pdf-chatbot-langchain/issues/43) - [scripts/ingest-data.ts](https://github.com/mayooear/gpt4-pdf-chatbot-langchain/blob/main/scripts/ingest-data.ts)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.