langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.41k stars 2.1k forks source link

PineClient: Error calling upsert #160

Closed mayooear closed 1 year ago

mayooear commented 1 year ago

As per the docs and latest Pinecone library, the code below should work. However, the function PineconStore.fromDocuments throws an error as per below. It appears there is an issue passing the vectors to Pinecone.

code:

const pinecone = new PineconeClient();
    await pinecone.init({
      environment: `${process.env.PINECONE_ENVIRONMENT}`,
      apiKey: `${process.env.PINECONE_API_KEY}`,
    });
    const index = pinecone.Index("langchainjsfundamentals");

// this is the cause of the error
        const vectorStore = await PineconeStore.fromDocuments(
      index,
      docs,
      new OpenAIEmbeddings()
    );

Error log: error PineconeClient: Error calling upsert: PineconeClient: Error calling upsertRaw: RequiredError: Required parameter requestParameters.upsertRequest was null or undefined when calling upsert.

hwchase17 commented 1 year ago

also getting this, looking into

hwchase17 commented 1 year ago

acutally i lied - i had a different error. the errors are pretty opaque so took me a while to debug. but never ran into this issue. could you try with a dummy docs value like [new Document({ pageContent: "foo" })]

mayooear commented 1 year ago

Hmm, my "docs" are derived from textSplitter.createDocuments([text]) so in the console.log it's already in required format as per above.

Upon further investigation, the error partially matches the pinecone's ts client upsertRaw function in the codebase as below:

            async upsertRaw(requestParameters: UpsertOperationRequest, initOverrides?: RequestInit | runtime.InitOverrideFunction): Promise<runtime.ApiResponse<UpsertResponse>> {
        if (requestParameters.upsertRequest === null || requestParameters.upsertRequest === undefined) {
            throw new runtime.RequiredError('upsertRequest','Required parameter requestParameters.upsertRequest was null or undefined when calling upsert.');
        }

It appears that langChain failed to pass the requestParameters to the function , which consists of the vector and namespace.

Debugging further, I noticed there may an error when the uuid is generated in langchain's pinecone.ts code function called addVectors:

  async addVectors(
    vectors: number[][],
    documents: Document[],
    ids?: string[]
  ): Promise<void> {
    const documentIds = ids == null ? documents.map(() => uuidv4()) : ids;

    await this.pineconeClient.upsert({
      upsertRequest: {
        vectors: vectors.map((values, idx) => ({
          id: documentIds[idx],
          metadata: {
            ...documents[idx].metadata,
            [this.textKey]: documents[idx].pageContent,
          },
          values,
        })),
        namespace: this.namespace,
      },
    });
  }

  //error

  Exception has occurred: TypeError: Cannot assign to read only property 'name' of function 'function generateUUID(value, namespace, buf, offset) {
    var _namespace;

I tested index.upsert separately without langchain and it works.

What errors do you see on your end?

nfcampos commented 1 year ago

@mayooear could you confirm which version of @pinecone-database/pinecone you are using in your project? We only support the most recent version released 2 days ago 0.0.9.

nfcampos commented 1 year ago

I have added some tests in #166 but couldn't reproduce your issue. Do you want to have a look at the tests and let me know if you can spot what you're doing differently?

mayooear commented 1 year ago

Yes i'm on 0.0.9 for pinecone and langchain 0.0.11. I also got a similar error trying to run VectorDBQA call.

Looking at your tests, the syntax of operations is different from the langchainjs docs:

const pinecone = new PineconeClient();
await pinecone.init({
  environment: "us-west1-gcp",
  apiKey: "apiKey",
});
const index = pinecone.Index("my-index");

// this is the cause of the error //
        const vectorStore = await PineconeStore.fromDocuments(
      index,
      docs,
      new OpenAIEmbeddings()
    );

Whereas your tests:

nfcampos commented 1 year ago

Ah, I think the issue is the version of langchain. version 0.0.11 was using a different library for the pinecone client, and recently we've changed to use the official pinecone client. If you update langchain the issue should go away. If not let me know

(FYI the syntax in the tests is equivalent to yours)

mayooear commented 1 year ago

Ah, I think the issue is the version of langchain. version 0.0.11 was using a different library for the pinecone client, and recently we've changed to use the official pinecone client. If you update langchain the issue should go away. If not let me know

(FYI the syntax in the tests is equivalent to yours)

Yeh, I upgraded and it crashed my test app. I spent a couple of hours debugging, here's what I've found so far:

Perhaps you can advise on the appropriate ts-config settings, but I tried to use the exact ones used in the repo example section.

nfcampos commented 1 year ago

The most recent version is now ESM only. In order to work with it your project needs to have "type": "module" in its package.json.

I'd recommend also changing your tsconfig to have


"target": "ES2020",
"module": "nodenext",
``` like here https://github.com/hwchase17/langchainjs/blob/main/examples/tsconfig.json

Other than that, if you're using Node 18 or 19 it should work without additional changes. If you're using Node 16 check the instructions here https://hwchase17.github.io/langchainjs/docs/getting-started/#installation

Let me know if that works
mayooear commented 1 year ago

The most recent version is now ESM only. In order to work with it your project needs to have "type": "module" in its package.json.

I'd recommend also changing your tsconfig to have

"target": "ES2020",
"module": "nodenext",
``` like here https://github.com/hwchase17/langchainjs/blob/main/examples/tsconfig.json

Other than that, if you're using Node 18 or 19 it should work without additional changes. If you're using Node 16 check the instructions here https://hwchase17.github.io/langchainjs/docs/getting-started/#installation

Let me know if that works

Thanks. I rebuilt the repo from scratch using your specs and using the latest version of langchain (with pinecone 0.0.8). I installed all packages using yarn.

await PineconeStore.fromDocuments works as expected now.

I attempted the vectordbqa chain method which failed. As per below it threw an error with regards to res.metadata when the similarity search function is run:

error:

error TypeError: Cannot destructure 'res.metadata' as it is undefined.
    at PineconeStore.similaritySearchVectorWithScore

code:

const model = new OpenAI({});
  /* Initialize Pinecone client*/
  const pinecone = new PineconeClient();
  //initialize the vectorstore to store embeddings
  await pinecone.init({
    environment: `${process.env.PINECONE_ENVIRONMENT}`,
    apiKey: `${process.env.PINECONE_API_KEY}`,
  });

  // retrieve API operations for index created in pinecone dashboard
  const index = pinecone.Index("index");
  console.log("index", index);

  try {
    // /* Create the vectorstore */
    const vectorStore = await PineconeStore.fromExistingIndex(
      index,
      new OpenAIEmbeddings(),
      "text",
      "test"
    );
    console.log("vectorstore", vectorStore);

    //error 
    const resultOne = await vectorStore.similaritySearch("president", 3);

    console.log("resultsOne", resultOne);

I have tested separately that pinecone's query function works as expected and returns metadata text. However, resultOne throws the error once I try to use the similaritySearch function which abstracts index.query.

mayooear commented 1 year ago

I found value of result within the similaritySearch function whilst debugging. This explains the metadata undefined error, but the cause is unknown for now.

code:

[
  {
    id: "id",
    score: 0,
    values: [
    ],
    metadata: undefined,
  },
]

Regardless, this shows that Pinecone can return matches with undefined metadata which breaks the function.

nfcampos commented 1 year ago

@mayooear ah interesting, thanks a lot for debugging! We should at least definitely be a bit more defensive handling the response from pinecone

mayooear commented 1 year ago

I found and fixed the problem. namespace property is missing in the pineconeClient.query function. Pinecone vectors have namespaces that return the metadata.

Shall I go ahead an make a pull request for "defensive handling" and fixing this bug?

nfcampos commented 1 year ago

@mayooear yes thank you!

mayooear commented 1 year ago

@nfcampos upon further investigation and many tests, I discovered there may be a core issue with the pinecone api docs and client types.

Essentially, when a new vector is created without a namespace it doesn't seem possible to query or fetch it. Their api docs say "The Query operation searches a namespace, using a query vector. It retrieves the ids of the most similar items in a namespace, along with their similarity scores."

And yet, the namespace field is optional. It appears that it should be required.

But this also means that the user should also be required to create namespaces for new vectors.

I can make another pull request to make namespaces required via Pinecone.ts, but I just wanted to get your thoughts/feedback first.

nfcampos commented 1 year ago

@mayooear thanks for looking into this more. from reading the pinecone docs I think namespace is optional, see When you don't specify a namespace name for an operation, Pinecone uses the default namespace name of "" (the empty string). in https://docs.pinecone.io/docs/namespaces. I'm going to merge your PR now

mayooear commented 1 year ago

Thanks for clarifying!