langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.81k stars 1.98k forks source link

Query pinecone vector store fails on 0.0.209 and works on 0.0.175 #3706

Closed scottsuhy closed 3 months ago

scottsuhy commented 6 months ago

with "^0.0.209" I am not getting back an answer from a pinecone vector database that I do get back with version 0.0.175.

here are the details:

verify the version of langchain"

C:\Apache24\htdocs\nextchapter>type package.json
{
  "name": "nextchapter",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "module",
  "dependencies": {
    "@huggingface/inference": "^1.8.0",
    "@pinecone-database/pinecone": "^1.1.2",
    "axios": "^1.6.2",
    "cheerio": "^1.0.0-rc.12",
    "crypto": "^1.0.1",
    "d3-dsv": "^2.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "got": "^13.0.0",
    "hnswlib-node": "^1.4.2",
    "huggingface": "^1.4.0",
    "langchain": "^0.0.175",
    "latest": "^0.2.0",
    "needle": "^3.2.0",
    "oauth-1.0a": "^2.2.6",
    "pdf-parse": "^1.1.1",
    "pg": "^8.11.3",
    "playwright": "^1.40.0",
    "puppeteer": "^19.11.1",
    "querystring": "^0.2.1",
    "readline": "^1.3.0",
    "serpapi": "^1.1.1",
    "textrank": "^1.0.5",
    "typeorm": "^0.3.17"
  }
}

Here is the test code -- same as in the docs:

C:\Apache24\htdocs\nextchapter>type demo20.js
import * as dotenv from "dotenv";
dotenv.config();
import { Pinecone } from "@pinecone-database/pinecone";
import { VectorDBQAChain } from "langchain/chains";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { OpenAI } from "langchain/llms/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index('index-scott-suhy-hotmail-com');

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex }
);

const model = new OpenAI();

const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
  k: 1,
  returnSourceDocuments: true,
});

const response = await chain.call({ query: "who was john suhysuhysuhy?" });

console.log("response: ", response);

Here is the result when running with 0.0.175. It works (see text below):

C:\Apache24\htdocs\nextchapter>node demo20.js
response:  {
  text: ' John suhysuhysuhy was an American pioneer.',
  sourceDocuments: [
    Document {
      pageContent: 'John suhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}

Upgrade to latest and verify version:

C:\Apache24\htdocs\nextchapter>npm install langchain@latest

added 4 packages, removed 1 package, changed 4 packages, and audited 561 packages in 11s

59 packages are looking for funding
  run `npm fund` for details

33 vulnerabilities (2 low, 10 moderate, 18 high, 3 critical)

To address issues that do not require attention, run:
  npm audit fix

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.

Verify version:

C:\Apache24\htdocs\nextchapter>type package.json
{
  "name": "nextchapter",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "module",
  "dependencies": {
    "@huggingface/inference": "^1.8.0",
    "@pinecone-database/pinecone": "^1.1.2",
    "axios": "^1.6.2",
    "cheerio": "^1.0.0-rc.12",
    "crypto": "^1.0.1",
    "d3-dsv": "^2.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "got": "^13.0.0",
    "hnswlib-node": "^1.4.2",
    "huggingface": "^1.4.0",
    "langchain": "^0.0.209",
    "latest": "^0.2.0",
    "needle": "^3.2.0",
    "oauth-1.0a": "^2.2.6",
    "pdf-parse": "^1.1.1",
    "pg": "^8.11.3",
    "playwright": "^1.40.0",
    "puppeteer": "^19.11.1",
    "querystring": "^0.2.1",
    "readline": "^1.3.0",
    "serpapi": "^1.1.1",
    "textrank": "^1.0.5",
    "typeorm": "^0.3.17"
  }
}

Response is now "I don't know":

C:\Apache24\htdocs\nextchapter>node demo20.js
response:  {
  text: " I don't know.",
  sourceDocuments: [
    Document {
      pageContent: 'John suhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}
dosubot[bot] commented 6 months ago

🤖

Thank you for providing such detailed information. It's strange that the query to Pinecone vector store fails with version 0.0.209 but works with version 0.0.175. To help me investigate this issue further, could you please provide the following additional information:

  1. Are there any error messages or warnings in the console when you run the code with version 0.0.209?
  2. Have you made any changes to the code or environment variables between the two versions?
  3. Could you please provide the version of Node.js you're using?
  4. Could you also provide the version of the Pinecone package you're using?

This information will help me understand the issue better and provide a more accurate solution.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

scottsuhy commented 6 months ago

@dosu-bot Here are the answers to your questions: Are there any error messages or warnings in the console when you run the code with version 0.0.209? - no, just a bad answer. Have you made any changes to the code or environment variables between the two versions? no Could you please provide the version of Node.js you're using? Node.js v18.16.1 Could you also provide the version of the Pinecone package you're using? "@pinecone-database/pinecone": "^1.1.2"

I also just realized I can create a vector store with 209, but I can't query with 209; yet I can't create a vector store with 175, but I can query with 175

Here is my create code:

async function processPDFs(directoryPath) {
    const pdfPaths = fs.readdirSync(directoryPath).filter(file => file.endsWith('.pdf')).map(file => path.resolve(directoryPath, file));
    const docs = [];
    for (let pdfPath of pdfPaths) {
      const loader = new PDFLoader(pdfPath);
      const pdfDocs = await loader.load();
      docs.push(...pdfDocs);
    }   
    const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });      
    const chunks = await splitter.splitDocuments(docs); 
    await PineconeStore.fromDocuments(chunks, new OpenAIEmbeddings(), {
       pineconeIndex,
       maxConcurrency: 5, 
    });
} // end processPDFs
const pdfDirectoryPath = './pdfs';
processPDFs(pdfDirectoryPath);

When you do the 'PineconeStore.fromDocuments', you get the error: "PineconeConnectionError: Request failed to reach Pinecone. This can occur for reasons such as incorrect configuration (environment, project id, index name), network problems that prevent the request from being completed, or a Pinecone API outage. Check your client configuration, check your network connection, and visit https://status.pinecone.io/ to see whether any outages are ongoing."

What version would work for both create and query?

dosubot[bot] commented 6 months ago

🤖

Based on the issue you've described, it seems like there might be a problem with the Pinecone connection in the newer version of LangChain. This could be due to a variety of reasons, such as changes in the Pinecone API, changes in the way LangChain interacts with Pinecone, or even a bug in the newer version of LangChain.

I found a similar issue in the LangChain repository: How to resolve this issue? (Pinecone insert error). This issue was resolved by a pull request: Filter null values out of metadata for pinecone. The problem in this case was that null values in the metadata were causing the Pinecone insert operation to fail. The solution was to filter out these null values before attempting the insert operation.

While your issue is not exactly the same, it's possible that a similar problem is occurring. You could try to debug this by logging the data you're trying to insert into Pinecone and checking if there are any null values or other problematic data.

If this doesn't resolve your issue, it would be helpful to have more information. Could you provide the exact error message you're seeing when the query to Pinecone fails? Additionally, could you provide more details about your environment, such as the operating system you're using and any other relevant software versions? This information could help identify potential compatibility issues or other problems.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

scottsuhy commented 6 months ago

@dosu-bot that is not the issue. i did find out, however that starting with build 181 I could insert records with no error,

after the index initialized I was able to do a successful query with 181. so this looks like the build that works for both.

Unfortunately, i think i have other code in the project that needs builds past 181 so i'll be watching this bug to see what you all think.

dosubot[bot] commented 6 months ago

🤖

Thank you for providing detailed information about the issue you're experiencing. It's quite unusual that you're able to create a vector store with LangChain version 0.0.209 but not query it, and vice versa with version 0.0.175.

To better understand the problem, could you please provide the following additional information:

  1. Are there any error messages or logs that you can share when you try to query the vector store with LangChain version 0.0.209?
  2. Similarly, are there any error messages or logs when you try to create a vector store with LangChain version 0.0.175?
  3. Have you made any changes to your environment or codebase between the two versions of LangChain?
  4. Are you using the same Pinecone index when you switch between the two versions of LangChain?

This information will help me understand the root cause of the issue and provide a more accurate solution.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jacoblee93 commented 6 months ago

There was a Pinecone SDK major version upgrade to 1.x somewhere in the middle of those versions CC @zackproser

Could be related? Are you able to pass verbose: true to the chain and see what is actually getting retrieved?

scottsuhy commented 6 months ago

I have both sides working with 181. Here is the query side of the problem with verbose: true

First, the working version with 181:

C:\Apache24\htdocs\nextchapter>type package.json
{
  "name": "nextchapter",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "module",
  "dependencies": {
    "@pinecone-database/pinecone": "^1.1.2",
    "axios": "^1.6.2",
    "cheerio": "^1.0.0-rc.12",
    "crypto": "^1.0.1",
    "d3-dsv": "^2.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "got": "^13.0.0",
    "hnswlib-node": "^1.4.2",
    "langchain": "^0.0.181",
    "latest": "^0.2.0",
    "needle": "^3.2.0",
    "oauth-1.0a": "^2.2.6",
    "pdf-parse": "^1.1.1",
    "pg": "^8.11.3",
    "playwright": "^1.40.0",
    "puppeteer": "^19.11.1",
    "querystring": "^0.2.1",
    "readline": "^1.3.0",
    "serpapi": "^1.1.1",
    "textrank": "^1.0.5",
    "typeorm": "^0.3.17"
  }
}

Run the app it works: "pageContent": "J\nohn \nsuhysuhysuhy was an american pioneer", note: there is only 1 pdf in the store and it has 1 line in it that says John suhysuhysuhy was an american pioneer.

C:\Apache24\htdocs\nextchapter>node demo20.js
[chain/start] [1:chain:VectorDBQAChain] Entering Chain run with input: {
  "query": "who was john suhysuhysuhy?"
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "input_documents": [
    {
      "pageContent": "J\nohn \nsuhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 3,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "/home/forge/myportfolio.cc/public/upload/scott_suhy@hotmail.com/John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "context": "J\nohn \nsuhysuhysuhy was an american pioneer"
}
[llm/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nJ\nohn \nsuhysuhysuhy was an american pioneer\n\nQuestion: who was john suhysuhysuhy?\nHelpful Answer:"
  ]
}
[llm/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] [509ms] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": " John suhysuhysuhy was an American pioneer.",
        "generationInfo": {
          "finishReason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llmOutput": {
    "tokenUsage": {
      "completionTokens": 13,
      "promptTokens": 78,
      "totalTokens": 91
    }
  }
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [512ms] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer."
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] [514ms] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer."
}
[chain/end] [1:chain:VectorDBQAChain] [1.87s] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer.",
  "sourceDocuments": [
    {
      "pageContent": "J\nohn \nsuhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 3,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "/home/forge/myportfolio.cc/public/upload/scott_suhy@hotmail.com/John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
{
  text: ' John suhysuhysuhy was an American pioneer.',
  sourceDocuments: [
    Document {
      pageContent: 'J\nohn \nsuhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}

Now with it not working on the 209

C:\Apache24\htdocs\nextchapter>npm install langchain@0.0.209

added 3 packages, removed 1 package, changed 4 packages, and audited 552 packages in 21s

57 packages are looking for funding
  run `npm fund` for details

33 vulnerabilities (2 low, 10 moderate, 18 high, 3 critical)

To address issues that do not require attention, run:
  npm audit fix

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.

C:\Apache24\htdocs\nextchapter>type package.json
{
  "name": "nextchapter",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "type": "module",
  "dependencies": {
    "@pinecone-database/pinecone": "^1.1.2",
    "axios": "^1.6.2",
    "cheerio": "^1.0.0-rc.12",
    "crypto": "^1.0.1",
    "d3-dsv": "^2.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "got": "^13.0.0",
    "hnswlib-node": "^1.4.2",
    "langchain": "^0.0.209",
    "latest": "^0.2.0",
    "needle": "^3.2.0",
    "oauth-1.0a": "^2.2.6",
    "pdf-parse": "^1.1.1",
    "pg": "^8.11.3",
    "playwright": "^1.40.0",
    "puppeteer": "^19.11.1",
    "querystring": "^0.2.1",
    "readline": "^1.3.0",
    "serpapi": "^1.1.1",
    "textrank": "^1.0.5",
    "typeorm": "^0.3.17"
  }
}
C:\Apache24\htdocs\nextchapter>node demo20.js
[chain/start] [1:chain:VectorDBQAChain] Entering Chain run with input: {
  "query": "who was john suhysuhysuhy?"
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "input_documents": [
    {
      "pageContent": "J\nohn \nsuhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 3,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "/home/forge/myportfolio.cc/public/upload/scott_suhy@hotmail.com/John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "context": "J\nohn \nsuhysuhysuhy was an american pioneer"
}
[llm/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nJ\nohn \nsuhysuhysuhy was an american pioneer\n\nQuestion: who was john suhysuhysuhy?\nHelpful Answer:"
  ]
}
[llm/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] [300ms] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": " I don't know.",
        "generationInfo": {
          "finishReason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llmOutput": {
    "tokenUsage": {
      "completionTokens": 5,
      "promptTokens": 72,
      "totalTokens": 77
    }
  }
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [302ms] Exiting Chain run with output: {
  "text": " I don't know."
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] [306ms] Exiting Chain run with output: {
  "text": " I don't know."
}
[chain/end] [1:chain:VectorDBQAChain] [1.35s] Exiting Chain run with output: {
  "text": " I don't know.",
  "sourceDocuments": [
    {
      "pageContent": "J\nohn \nsuhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 3,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "/home/forge/myportfolio.cc/public/upload/scott_suhy@hotmail.com/John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
{
  text: " I don't know.",
  sourceDocuments: [
    Document {
      pageContent: 'J\nohn \nsuhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}

I don't see anything in the above log that points to the issue

jacoblee93 commented 6 months ago

Ah one other thing is we switched the default model from text-davinci-003 to gpt-3.5-turbo-instruct since OpenAI is deprecating it in two weeks. Can you try switching the used model to ChatOpenAI and gpt-3.5-turbo?

scottsuhy commented 6 months ago

Here is what you will see in the log below: Version 209: running with ChatOpenAI ({modelName: 'gpt-4-1106-preview'}) = Fail Version 209: running with ChatOpenAI ({modelName: 'gpt-3.5-turbo'}) = Fail Version 181: running with OpenAI () = Success

this is very easy to reproduce. I've uploaded the pdf John suhysuhysuhy was an american pioneer.pdf

and here is the code to create / query:

I am new to both langchain and pinecode so I may be doing something wrong but I think most of this is from the example code.

import * as dotenv from "dotenv";
dotenv.config();
import fs from 'fs';
import path from 'path';
import { Pinecone } from "@pinecone-database/pinecone";
import { VectorDBQAChain } from "langchain/chains";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { OpenAI } from "langchain/llms/openai";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; 

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index('index-scott-suhy-hotmail-com');

//CREATE THE VECTORSTORE *********************

async function processPDFs(directoryPath) {
    const pdfPaths = fs.readdirSync(directoryPath).filter(file => file.endsWith('.pdf')).map(file => path.resolve(directoryPath, file));
    const docs = [];
    for (let pdfPath of pdfPaths) {
      const loader = new PDFLoader(pdfPath);
      const pdfDocs = await loader.load();
      docs.push(...pdfDocs);
    }

    const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });

    const chunks = await splitter.splitDocuments(docs);

    console.log("Chunks: ", chunks, "Docs: ", docs);

    const store = await PineconeStore.fromDocuments(chunks, new OpenAIEmbeddings(), {
       pineconeIndex,
       maxConcurrency: 5, 
    });

    console.log("Created vectorStore...", store);

} // end processPDFs

  const pdfDirectoryPath = './pdfs';

  console.log("entering processPDFs...");
  processPDFs(pdfDirectoryPath);
  console.log("leaving processPDFs...");

//********************************************

//QUERY THE VECTORSTORE *********************

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex }
);

//const model = new ChatOpenAI ({modelName: 'gpt-4-1106-preview'});
//const model = new ChatOpenAI ({modelName: 'gpt-3.5-turbo'});
const model = new OpenAI ();
const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
  k: 1,
  returnSourceDocuments: true,
  verbose: true
});
const response = await chain.call({ query: "who was john suhysuhysuhy?" });
console.log(response);

Here is it running with ChatOpenAI ({modelName: 'gpt-4-1106-preview'})

C:\Apache24\htdocs\nextchapter>node demo20.js
entering processPDFs...
leaving processPDFs...
[chain/start] [1:chain:VectorDBQAChain] Entering Chain run with input: {
  "query": "who was john suhysuhysuhy?"
}
Chunks:  [
  Document {
    pageContent: 'John suhysuhysuhy was an american pioneer',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
] Docs:  [
  Document {
    pageContent: 'John suhysuhysuhy was an american pioneer \n ',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
]
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "input_documents": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "context": "John suhysuhysuhy was an american pioneer"
}
[llm/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] Entering LLM run with input: {
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "SystemMessage"
        ],
        "kwargs": {
          "content": "Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nJohn suhysuhysuhy was an american pioneer",
          "additional_kwargs": {}
        }
      },
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "who was john suhysuhysuhy?",
          "additional_kwargs": {}
        }
      }
    ]
  ]
}
Created vectorStore... PineconeStore {
  lc_serializable: false,
  lc_kwargs: {
    pineconeIndex: Index {
      config: [Object],
      target: [Object],
      _deleteAll: [Function (anonymous)],
      _deleteMany: [Function (anonymous)],
      _deleteOne: [Function (anonymous)],
      _describeIndexStats: [Function (anonymous)],
      _fetchCommand: [FetchCommand],
      _queryCommand: [QueryCommand],
      _updateCommand: [UpdateCommand],
      _upsertCommand: [UpsertCommand]
    },
    maxConcurrency: 5,
    textKey: 'text'
  },
  lc_namespace: [ 'langchain', 'vectorstores', 'pinecone' ],
  embeddings: OpenAIEmbeddings {
    caller: AsyncCaller {
      maxConcurrency: 2,
      maxRetries: 6,
      onFailedAttempt: [Function: defaultFailedAttemptHandler],
      queue: [PQueue]
    },
    modelName: 'text-embedding-ada-002',
    batchSize: 512,
    stripNewLines: true,
    timeout: undefined,
    azureOpenAIApiVersion: undefined,
    azureOpenAIApiKey: undefined,
    azureOpenAIApiInstanceName: undefined,
    azureOpenAIApiDeploymentName: undefined,
    azureOpenAIBasePath: undefined,
    organization: undefined,
    client: OpenAI {
      baseURL: 'https://api.openai.com/v1',
      maxRetries: 0,
      timeout: 600000,
      httpAgent: undefined,
      fetch: [Function],
      completions: [Completions],
      chat: [Chat],
      edits: [Edits],
      embeddings: [Embeddings],
      files: [Files],
      images: [Images],
      audio: [Audio],
      moderations: [Moderations],
      models: [Models],
      fineTuning: [FineTuning],
      fineTunes: [FineTunes],
      beta: [Beta],
      _options: [Object],
      apiKey: 'DELETED',
      organization: null
    },
    clientConfig: {
      apiKey: 'DELETED',
      organization: undefined,
      baseURL: undefined,
      dangerouslyAllowBrowser: true,
      defaultHeaders: undefined,
      defaultQuery: undefined
    }
  },
  textKey: 'text',
  namespace: undefined,
  pineconeIndex: Index {
    config: {
      environment: 'us-east-1-aws',
      apiKey: 'DELETED',
      projectId: 'DELETED',
    },
    target: { index: 'index-scott-suhy-hotmail-com', namespace: '' },
    _deleteAll: [Function (anonymous)],
    _deleteMany: [Function (anonymous)],
    _deleteOne: [Function (anonymous)],
    _describeIndexStats: [Function (anonymous)],
    _fetchCommand: FetchCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _queryCommand: QueryCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _updateCommand: UpdateCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _upsertCommand: UpsertCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    }
  },
  filter: undefined,
  caller: AsyncCaller {
    maxConcurrency: 5,
    maxRetries: 6,
    onFailedAttempt: [Function: defaultFailedAttemptHandler],
    queue: PQueue {
      _events: Events <Complex prototype> {},
      _eventsCount: 0,
      _intervalCount: 1,
      _intervalEnd: 0,
      _pendingCount: 0,
      _resolveEmpty: [Function: empty],
      _resolveIdle: [Function: empty],
      _carryoverConcurrencyCount: false,
      _isIntervalIgnored: true,
      _intervalCap: Infinity,
      _interval: 0,
      _queue: [PriorityQueue],
      _queueClass: [class PriorityQueue],
      _concurrency: 5,
      _intervalId: undefined,
      _timeout: undefined,
      _throwOnTimeout: false,
      _isPaused: false
    }
  }
}
[llm/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] [4.84s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated.",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated.",
            "additional_kwargs": {}
          }
        },
        "generationInfo": {
          "finish_reason": "stop"
        }
      }
    ]
  ],
  "llmOutput": {
    "tokenUsage": {
      "completionTokens": 90,
      "promptTokens": 74,
      "totalTokens": 164
    }
  }
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [4.85s] Exiting Chain run with output: {
  "text": "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated."
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] [4.85s] Exiting Chain run with output: {
  "text": "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated."
}
[chain/end] [1:chain:VectorDBQAChain] [5.73s] Exiting Chain run with output: {
  "text": "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated.",
  "sourceDocuments": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
{
  text: "I'm sorry, but I cannot find any information on a person named John Suhysuhysuhy. There might be a spelling error or confusion with the name. If you can provide more details or context, I might be able to help you find the correct information. Otherwise, it's likely that John Suhysuhysuhy is not a recognized historical figure, or the name has been misspelled or miscommunicated.",
  sourceDocuments: [
    Document {
      pageContent: 'John suhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}

Here it's running with : ChatOpenAI ({modelName: 'gpt-3.5-turbo'})

C:\Apache24\htdocs\nextchapter>node demo20.js
entering processPDFs...
leaving processPDFs...
[chain/start] [1:chain:VectorDBQAChain] Entering Chain run with input: {
  "query": "who was john suhysuhysuhy?"
}
Chunks:  [
  Document {
    pageContent: 'John suhysuhysuhy was an american pioneer',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
] Docs:  [
  Document {
    pageContent: 'John suhysuhysuhy was an american pioneer \n ',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
]
Created vectorStore... PineconeStore {
  lc_serializable: false,
  lc_kwargs: {
    pineconeIndex: Index {
      config: [Object],
      target: [Object],
      _deleteAll: [Function (anonymous)],
      _deleteMany: [Function (anonymous)],
      _deleteOne: [Function (anonymous)],
      _describeIndexStats: [Function (anonymous)],
      _fetchCommand: [FetchCommand],
      _queryCommand: [QueryCommand],
      _updateCommand: [UpdateCommand],
      _upsertCommand: [UpsertCommand]
    },
    maxConcurrency: 5,
    textKey: 'text'
  },
  lc_namespace: [ 'langchain', 'vectorstores', 'pinecone' ],
  embeddings: OpenAIEmbeddings {
    caller: AsyncCaller {
      maxConcurrency: 2,
      maxRetries: 6,
      onFailedAttempt: [Function: defaultFailedAttemptHandler],
      queue: [PQueue]
    },
    modelName: 'text-embedding-ada-002',
    batchSize: 512,
    stripNewLines: true,
    timeout: undefined,
    azureOpenAIApiVersion: undefined,
    azureOpenAIApiKey: undefined,
    azureOpenAIApiInstanceName: undefined,
    azureOpenAIApiDeploymentName: undefined,
    azureOpenAIBasePath: undefined,
    organization: undefined,
    client: OpenAI {
      baseURL: 'https://api.openai.com/v1',
      maxRetries: 0,
      timeout: 600000,
      httpAgent: undefined,
      fetch: [Function],
      completions: [Completions],
      chat: [Chat],
      edits: [Edits],
      embeddings: [Embeddings],
      files: [Files],
      images: [Images],
      audio: [Audio],
      moderations: [Moderations],
      models: [Models],
      fineTuning: [FineTuning],
      fineTunes: [FineTunes],
      beta: [Beta],
      _options: [Object],
      apiKey: 'DELETED',
      organization: null
    },
    clientConfig: {
      apiKey: 'DELETED',
      organization: undefined,
      baseURL: undefined,
      dangerouslyAllowBrowser: true,
      defaultHeaders: undefined,
      defaultQuery: undefined
    }
  },
  textKey: 'text',
  namespace: undefined,
  pineconeIndex: Index {
    config: {
      environment: 'us-east-1-aws',
      apiKey: 'DELETED',
      projectId: 'DELETED',
    },
    target: { index: 'index-scott-suhy-hotmail-com', namespace: '' },
    _deleteAll: [Function (anonymous)],
    _deleteMany: [Function (anonymous)],
    _deleteOne: [Function (anonymous)],
    _describeIndexStats: [Function (anonymous)],
    _fetchCommand: FetchCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _queryCommand: QueryCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _updateCommand: UpdateCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _upsertCommand: UpsertCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    }
  },
  filter: undefined,
  caller: AsyncCaller {
    maxConcurrency: 5,
    maxRetries: 6,
    onFailedAttempt: [Function: defaultFailedAttemptHandler],
    queue: PQueue {
      _events: Events <Complex prototype> {},
      _eventsCount: 0,
      _intervalCount: 1,
      _intervalEnd: 0,
      _pendingCount: 0,
      _resolveEmpty: [Function: empty],
      _resolveIdle: [Function: empty],
      _carryoverConcurrencyCount: false,
      _isIntervalIgnored: true,
      _intervalCap: Infinity,
      _interval: 0,
      _queue: [PriorityQueue],
      _queueClass: [class PriorityQueue],
      _concurrency: 5,
      _intervalId: undefined,
      _timeout: undefined,
      _throwOnTimeout: false,
      _isPaused: false
    }
  }
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "input_documents": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "context": "John suhysuhysuhy was an american pioneer"
}
[llm/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] Entering LLM run with input: {
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "SystemMessage"
        ],
        "kwargs": {
          "content": "Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nJohn suhysuhysuhy was an american pioneer",
          "additional_kwargs": {}
        }
      },
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "who was john suhysuhysuhy?",
          "additional_kwargs": {}
        }
      }
    ]
  ]
}
[llm/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] [1.47s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized.",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized.",
            "additional_kwargs": {}
          }
        },
        "generationInfo": {
          "finish_reason": "stop"
        }
      }
    ]
  ],
  "llmOutput": {
    "tokenUsage": {
      "completionTokens": 39,
      "promptTokens": 74,
      "totalTokens": 113
    }
  }
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [1.48s] Exiting Chain run with output: {
  "text": "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized."
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] [1.48s] Exiting Chain run with output: {
  "text": "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized."
}
[chain/end] [1:chain:VectorDBQAChain] [2.32s] Exiting Chain run with output: {
  "text": "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized.",
  "sourceDocuments": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
{
  text: "I'm sorry, I don't have any information on a person named John suhysuhysuhy. It's possible that there is a typo or the name is not widely recognized.",
  sourceDocuments: [
    Document {
      pageContent: 'John suhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}

Here I downgrade to 181 and back to: const model = new OpenAI ();

C:\Apache24\htdocs\nextchapter>npm install langchain@0.0.181

removed 3 packages, changed 5 packages, and audited 549 packages in 1m

57 packages are looking for funding
  run `npm fund` for details

33 vulnerabilities (2 low, 10 moderate, 18 high, 3 critical)

To address issues that do not require attention, run:
  npm audit fix

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.
C:\Apache24\htdocs\nextchapter>node demo20.js
entering processPDFs...
leaving processPDFs...
[chain/start] [1:chain:VectorDBQAChain] Entering Chain run with input: {
  "query": "who was john suhysuhysuhy?"
}
Chunks:  [
  Document {
    pageContent: 'J\nohn \nsuhysuhysuhy was an american pioneer',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
] Docs:  [
  Document {
    pageContent: 'J\nohn \nsuhysuhysuhy was an american pioneer\n \n ',
    metadata: {
      source: 'C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf',
      pdf: [Object],
      loc: [Object]
    }
  }
]
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "input_documents": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
[chain/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input: {
  "question": "who was john suhysuhysuhy?",
  "context": "John suhysuhysuhy was an american pioneer"
}
[llm/start] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nJohn suhysuhysuhy was an american pioneer\n\nQuestion: who was john suhysuhysuhy?\nHelpful Answer:"
  ]
}
Created vectorStore... PineconeStore {
  lc_serializable: false,
  lc_kwargs: {
    pineconeIndex: Index {
      config: [Object],
      target: [Object],
      _deleteAll: [Function (anonymous)],
      _deleteMany: [Function (anonymous)],
      _deleteOne: [Function (anonymous)],
      _describeIndexStats: [Function (anonymous)],
      _fetchCommand: [FetchCommand],
      _queryCommand: [QueryCommand],
      _updateCommand: [UpdateCommand],
      _upsertCommand: [UpsertCommand]
    },
    maxConcurrency: 5,
    textKey: 'text'
  },
  lc_namespace: [ 'langchain', 'vectorstores', 'pinecone' ],
  embeddings: OpenAIEmbeddings {
    caller: AsyncCaller {
      maxConcurrency: 2,
      maxRetries: 6,
      onFailedAttempt: [Function: defaultFailedAttemptHandler],
      queue: [PQueue]
    },
    modelName: 'text-embedding-ada-002',
    batchSize: 512,
    stripNewLines: true,
    timeout: undefined,
    azureOpenAIApiVersion: undefined,
    azureOpenAIApiKey: undefined,
    azureOpenAIApiInstanceName: undefined,
    azureOpenAIApiDeploymentName: undefined,
    azureOpenAIBasePath: undefined,
    organization: undefined,
    client: OpenAI {
      baseURL: 'https://api.openai.com/v1',
      maxRetries: 0,
      timeout: 600000,
      httpAgent: undefined,
      fetch: [Function],
      completions: [Completions],
      chat: [Chat],
      edits: [Edits],
      embeddings: [Embeddings],
      files: [Files],
      images: [Images],
      audio: [Audio],
      moderations: [Moderations],
      models: [Models],
      fineTuning: [FineTuning],
      fineTunes: [FineTunes],
      _options: [Object],
      apiKey: 'DELETED',
      organization: null
    },
    clientConfig: {
      apiKey: 'DELETED',
      organization: undefined,
      baseURL: undefined,
      dangerouslyAllowBrowser: true,
      defaultHeaders: undefined,
      defaultQuery: undefined
    }
  },
  textKey: 'text',
  namespace: undefined,
  pineconeIndex: Index {
    config: {
      environment: 'us-east-1-aws',
      apiKey: 'DELETED',
      projectId: 'DELETED',
    },
    target: { index: 'index-scott-suhy-hotmail-com', namespace: '' },
    _deleteAll: [Function (anonymous)],
    _deleteMany: [Function (anonymous)],
    _deleteOne: [Function (anonymous)],
    _describeIndexStats: [Function (anonymous)],
    _fetchCommand: FetchCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _queryCommand: QueryCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _updateCommand: UpdateCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    },
    _upsertCommand: UpsertCommand {
      apiProvider: [VectorOperationsProvider],
      namespace: '',
      validator: [Function (anonymous)]
    }
  },
  filter: undefined,
  caller: AsyncCaller {
    maxConcurrency: 5,
    maxRetries: 6,
    onFailedAttempt: [Function: defaultFailedAttemptHandler],
    queue: PQueue {
      _events: Events <Complex prototype> {},
      _eventsCount: 0,
      _intervalCount: 1,
      _intervalEnd: 0,
      _pendingCount: 0,
      _resolveEmpty: [Function: empty],
      _resolveIdle: [Function: empty],
      _carryoverConcurrencyCount: false,
      _isIntervalIgnored: true,
      _intervalCap: Infinity,
      _interval: 0,
      _queue: [PriorityQueue],
      _queueClass: [class PriorityQueue],
      _concurrency: 5,
      _intervalId: undefined,
      _timeout: undefined,
      _throwOnTimeout: false,
      _isPaused: false
    }
  }
}
[llm/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:OpenAI] [607ms] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": " John suhysuhysuhy was an American pioneer.",
        "generationInfo": {
          "finishReason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llmOutput": {
    "tokenUsage": {
      "completionTokens": 13,
      "promptTokens": 74,
      "totalTokens": 87
    }
  }
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [610ms] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer."
}
[chain/end] [1:chain:VectorDBQAChain > 2:chain:StuffDocumentsChain] [611ms] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer."
}
[chain/end] [1:chain:VectorDBQAChain] [1.61s] Exiting Chain run with output: {
  "text": " John suhysuhysuhy was an American pioneer.",
  "sourceDocuments": [
    {
      "pageContent": "John suhysuhysuhy was an american pioneer",
      "metadata": {
        "loc.lines.from": 1,
        "loc.lines.to": 1,
        "loc.pageNumber": 1,
        "pdf.info.Author": "scott suhy",
        "pdf.info.CreationDate": "D:20231218135105-05'00'",
        "pdf.info.Creator": "Microsoft® Word for Microsoft 365",
        "pdf.info.IsAcroFormPresent": false,
        "pdf.info.IsXFAPresent": false,
        "pdf.info.ModDate": "D:20231218135105-05'00'",
        "pdf.info.PDFFormatVersion": "1.7",
        "pdf.info.Producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.dc:creator": "scott suhy",
        "pdf.metadata._metadata.pdf:producer": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:createdate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmp:creatortool": "Microsoft® Word for Microsoft 365",
        "pdf.metadata._metadata.xmp:modifydate": "2023-12-18T13:51:05-05:00",
        "pdf.metadata._metadata.xmpmm:documentid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.metadata._metadata.xmpmm:instanceid": "uuid:3E7CF679-6F2D-44F9-A06F-13CF24654B4C",
        "pdf.totalPages": 1,
        "pdf.version": "1.10.100",
        "source": "C:\\Apache24\\htdocs\\nextchapter\\pdfs\\John suhysuhysuhy was an american pioneer.pdf"
      }
    }
  ]
}
{
  text: ' John suhysuhysuhy was an American pioneer.',
  sourceDocuments: [
    Document {
      pageContent: 'John suhysuhysuhy was an american pioneer',
      metadata: [Object]
    }
  ]
}
dosubot[bot] commented 3 months ago

Hi, @scottsuhy

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with querying a Pinecone vector store using different versions of LangChain. The problem seems to be related to version 0.0.209, where querying the vector store fails, while it works with version 0.0.175. There have been discussions about potential changes in the default model and suggestions to try different models. However, the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

jacoblee93 commented 3 months ago

Please reopen if this is still occurring in latest @langchain/pinecone.