pinecone-io / pinecone-ts-client

The official TypeScript/Node client for the Pinecone vector database
https://www.pinecone.io
Apache License 2.0
188 stars 37 forks source link

[Bug] sparse vectors and values don't work in upserting or querying, but Python and REST works #56

Closed simwijs-evolante closed 1 year ago

simwijs-evolante commented 1 year ago

Is this a new bug?

Current Behavior

When upserting using sparseValues or querying using sparseVector according to the docs, the results are not using the sparse values for either endpoint.

I noticed the tests simply test if the request is OK, and not actually checking that the values give some sort of different value, which would be more appropriate to check that the sparse values have an effect (which they do). When testing the python client, both upserting and querying works, but the API in this javascript client don't work for either request. I looked through the docs here and tried to find a reason to it in your source code to no avail

Using the Python API (the top 5 vectors have upserted sparse values)

{
  "matches":[
    {
      "id":"93690",
      "score":8.97476578,
      "values":[

      ]
    },
    {
      "id":"42817",
      "score":2.9199748,
      "values":[

      ]
    },
    {
      "id":"66166",
      "score":1.56526351,
      "values":[

      ]
    },
    {
      "id":"54510",
      "score":1.12963557,
      "values":[

      ]
    },
    {
      "id":"93695",
      "score":0.960861206,
      "values":[

      ]
    },
    {
      "id":"77403",
      "score":0.819498241,
      "values":[

      ]
    },
    {
      "id":"77019",
      "score":0.790361524,
      "values":[

      ]
    },
    {
      "id":"93621",
      "score":0.775855958,
      "values":[

      ]
    },
    {
      "id":"93692",
      "score":0.766911685,
      "values":[

      ]
    },
    {
      "id":"93687",
      "score":0.755503654,
      "values":[

      ]
    }
  ]

Using the same query using the node API results in a different result (due to it not respecting the sparse values in the query request). See especially the first entry and compare it to the above python output.

[
  {
    "id": "93690",
    "score": 0.772066057,
    "values": []
  },
  {
    "id": "93621",
    "score": 0.767984629,
    "values": []
  },
  {
    "id": "93692",
    "score": 0.75923115,
    "values": []
  },
  {
    "id": "54510",
    "score": 0.758593678,
    "values": []
  },
  {
    "id": "93687",
    "score": 0.747873187,
    "values": []
  },
  {
    "id": "93686",
    "score": 0.746405125,
    "values": []
  },
  {
    "id": "93680",
    "score": 0.745493054,
    "values": []
  },
  {
    "id": "93678",
    "score": 0.744989276,
    "values": []
  },
  {
    "id": "93679",
    "score": 0.743263364,
    "values": []
  },
  {
    "id": "93683",
    "score": 0.738592684,
    "values": []
  },
]

Query request in this javascript API is

const queryRequest = {
            vector: dense,
            sparseVector: sparse,
            topK: params.limit,
            filter: filters,
        }

Upsert request is (unmodified this works fine using REST)

const upsertRequest = {
        id: my_id,
        sparseValues,
        values: vectors
        metadata,
}

I also tried different naming such as sparse_vector or sparse_values with no success.

I made my own workaround for now using REST API using Axios, which works great. I'm using the same request object as I am using in the javascript API here, and the REST API works as expected just as with python.

Expected Behavior

API should respect sparse values in both upsert and query endpoints.

I suggest creating a test to check that the score returned is different when using sparseVector in query vs not using it.

Steps To Reproduce

  1. Upsert a vector using this API including sparseValues
  2. Query the vector using sparseVector
  3. Upsert the vector using the python client
  4. Query the vector using the python client
  5. Query the vector using this javascript API

Steps 3-4 work Steps 2 and 5 don't give the same output as with the python client

Relevant log output

No response

Environment

- **OS**: Dockerfile using `FROM node:16`
- **Language version**:
- **Pinecone client version**: 0.0.14 (tried 0.0.12 as well)

Additional Context

Awesome library otherwise! It seems to have got quite the traction, so I hope this issue helps. Let me know if you need more context that can help you debug!

stephenkalnoske-sans commented 1 year ago

Is this still the case? I'm about to re-upsert our vectors to include sparse-dense vectors now, and would love to know if I need to implement @simwijs-evolante's workaround.

simwijs-evolante commented 1 year ago

This issue isn't resolved so not sure if the maintainers have seen/acknowledged/fixed it. Though there have been 3 new versions since I reported it, it could be fixed, but haven't tested it. I'll comment again if I test it, but the workaround works fine so I'm not very inclined to jump in right away

Is this still the case? I'm about to re-upsert our vectors to include sparse-dense vectors now, and would love to know if I need to implement @simwijs-evolante's workaround.

simwijs-evolante commented 1 year ago

Is this still the case? I'm about to re-upsert our vectors to include sparse-dense vectors now, and would love to know if I need to implement @simwijs-evolante's workaround.

I looked through the changes in the code from the releases and can't see a test that goes through this edge case I am referring to. Since the upserting sparse values test only checks that it returns matches, and not that the result actually returns sparse values (proving that the sparse values actually are upserted). So this case is not covered by a test as I suggest in the issue.

THe test I am referring to

    it('should be able to upsert a vector with sparse values', async () => {
      const index = client.Index(indexName)
      const upsertRequest: UpsertRequest = {
        vectors: vectorsWithSparseValues,
        namespace
      }
      await index.upsert({ upsertRequest }) // <--- this works even though the vectors are not accepting the sparse values
      const randomVector = getRandomVector(vectors)
      const queryRequest: QueryRequest = {
        topK: 1,
        vector: randomVector.values,
        sparseVector: randomVector.sparseValues,
        namespace
      }

      const queryResponse = await index.query({ queryRequest }) // <--- this also works even though the vectors are not accepting the sparse values
      expect(queryResponse?.matches?.length).toBeGreaterThan(0) // <------ here you see it only tests for matches, which goes through, even if the sparse values are not there
    })
stephenkalnoske-sans commented 1 year ago

I just tried on v0.1.6, seems to work as advertised for me now.

  1. Uploaded sparse and dense vectors using the Node library
  2. Fetched a vector using Postman to confirm the sparse values were present
  3. Ran a query using the Node library with sparse values and without. Confirmed search results were different and scored differently.

So it seems it's been fixed in one of the versions now. Sweet!

jhamon commented 1 year ago

Seems like a previous release resolved this problem so I will mark this issue closed.

You might be interested in checking out the new v1 client for more improvements. Check out the release notes and v1 migration guide to get started using the new client.