mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.95k stars 3.02k forks source link

error TypeError: Cannot read properties of undefined (reading 'text') when run npm run ingest #470

Open Yenhi501 opened 5 months ago

Yenhi501 commented 5 months ago

creating vector store... error TypeError: Cannot read properties of undefined (reading 'text') at (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:44:57) at step (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:33:23) at Object.next (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:14:53) at (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:8:71) at new Promise () at __awaiter (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:4:12) at extractMessage (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\utils.js:40:48) at (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\handling.js:66:70) at step (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\handling.js:33:23) at Object.next (d:\demo\node_modules\@pinecone-database\pinecone\dist\errors\handling.js:14:53) d:\demo\scripts\ingest-data.ts:46 throw new Error('Failed to ingest your data'); ^

Error: Failed to ingest your data at run (d:\demo\scripts\ingest-data.ts:46:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at (d:\demo\scripts\ingest-data.ts:51:3)

Node.js v19.8.1

My code :

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';

/* Name of directory to retrieve your files from 
   Make sure to add your PDF files inside the 'docs' folder
*/
const filePath = 'docs';

export const run = async () => {
  try {
    /*load raw docs from the all files in the directory */
    const directoryLoader = new DirectoryLoader(filePath, {
      '.pdf': (path) => new PDFLoader(path),
    });

    // const loader = new PDFLoader(filePath);
    const rawDocs = await directoryLoader.load();

    /* Split text into chunks */
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });

    const docs = await textSplitter.splitDocuments(rawDocs);
    console.log('split docs', docs);

    console.log('creating vector store...');
    /*create and store the embeddings in the vectorStore*/
    const embeddings = new OpenAIEmbeddings();
    const index = pinecone.Index(PINECONE_INDEX_NAME); //change to your own index name

    //embed the PDF documents
    await PineconeStore.fromDocuments(docs, embeddings, {
      pineconeIndex: index,
      namespace: PINECONE_NAME_SPACE,
      textKey: 'text',
    });
  } catch (error) {
    console.log('error', error);
    throw new Error('Failed to ingest your data');
  }
};

(async () => {
  await run();
  console.log('ingestion complete');
})();
Yenhi501 commented 5 months ago

Please help me, I have a report related to this

Yenhi501 commented 5 months ago

Hello, I have fixed it I update const index = pinecone.Index(PINECONE_INDEX_NAME); to const pc = new Pinecone({ apiKey: 'key' }); const index = pc.index(PINECONE_INDEX_NAME);

jscksy commented 4 months ago

Hello, I have fixed it I update const index = pinecone.Index(PINECONE_INDEX_NAME); to const pc = new Pinecone({ apiKey: 'key' }); const index = pc.index(PINECONE_INDEX_NAME);

how to see the pinecone index's environment

mdrokz commented 3 months ago

Hello, I have fixed it I update const index = pinecone.Index(PINECONE_INDEX_NAME); to const pc = new Pinecone({ apiKey: 'key' }); const index = pc.index(PINECONE_INDEX_NAME);

how to see the pinecone index's environment

image

the environment is the region specified here

mdrokz commented 3 months ago

Hello, I have fixed it I update const index = pinecone.Index(PINECONE_INDEX_NAME); to const pc = new Pinecone({ apiKey: 'key' }); const index = pc.index(PINECONE_INDEX_NAME);

Im facing the same error @mayooear can you take a look at this please ?

arpeiks commented 2 months ago

Hello, I have fixed it I update const index = pinecone.Index(PINECONE_INDEX_NAME); to const pc = new Pinecone({ apiKey: 'key' }); const index = pc.index(PINECONE_INDEX_NAME);

Im facing the same error @mayooear can you take a look at this please ?

I was able to fixe the issue by using the individual Langchain packages

import { PineconeStore } from "@langchain/pinecone";
import { Document } from "@langchain/core/documents";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Pinecone } from "@pinecone-database/pinecone";

Check out a sample implementation https://github.com/arpeiks/gpt-langchain

wuyue112524 commented 2 months ago

run this command to update the package and it solves this issue for me: yarn add @pinecone-database/pinecone@latest