pinecone-io / pinecone-datasets

An open-source dataset library for pre-embedded dataset: create your own data catalog, or use Pinecone's public datasets.
https://pinecone-io.github.io/pinecone-datasets/
32 stars 12 forks source link

to_index() should always use gRPC for bulk upserts #24

Open igiloh-pinecone opened 1 year ago

igiloh-pinecone commented 1 year ago

@miararoy my bad, I missed this in #22. That this code should never have been merged - it breaks one of the key principles behind pinecone-datasets

Problem

One of the design principles of pinecone-dataset from day one was providing fast bulk upserts via gRPC. which isn't optional. The only change from version 0.5 to 0.6 should have been the underlying client - from still beta Client 3.0 to Client 2.2[grpc].
pinecone-datasets is not meant to support REST based upserts, which can be achieved through the client directly.

Solution

Made pinecone-client[grpc] a mandatory requirement, and use GRPCIndex as the only supported index type

Type of Change

Test Plan

Full coverage in current unit tests