pinecone-io / pinecone-python-client

The Pinecone Python client
https://www.pinecone.io/docs
Apache License 2.0
309 stars 80 forks source link

Add bulk import #386

Closed jhamon closed 2 months ago

jhamon commented 3 months ago

Problem

Implement the following new methods:

Solution

Code generation changes

Since these features are in prerelease, they only exist in the spec for the upcoming 2024-10 API version. This required me to make modifications to the codegen script that is now run as:

./codegen/build-oas.sh 2024-07 false && ./codegen/build-oas.sh 2024-10 true

The second boolean argument is used to tell the codegen script whether the generated code should be stored in a new pinecone/core_ea subpackage. In the future we should probably do more to hide this complexity from the developer, but for now it is good enough.

Code organization

For the bespoke bits of the implementation that wrap the generated code, I have put them into a new class, ImportFeatureMixin, that the Index class inherits from. These functions could have all been implemented directly in the Index class, but I thought it a bit tidier to segregate these into a separate spot than just dump everything into one giant file.

Overridden repr representation on generated objects

The default print output in the generated classes comes from pprint and it looks quite poor for large objects. So I installed overrides that dump the objects into a formatted json style instead. I had previously done something similar for describe_index, etc, methods, so for this PR it was just a matter of cleaning up that logic a bit and moving it somewhere it could be reused.

So far, I haven't tweaked the generated classes to do this approach across the board because it doesn't work well for long arrays of vector values.

Type of Change

Test Plan

Manual testing with a dev release is in this demo notebook

aulorbe commented 2 months ago

Total nit that obviously you can ignore if you want, but maybe we should change the title of this PR from "Early access bulk import" to "Add early access bulk import", just so users later on know that this PR added the functionality (instead of iterated/removed/etc.)