pinecone-io / pinecone-python-client

The Pinecone Python client
https://www.pinecone.io/docs
Apache License 2.0
284 stars 78 forks source link

[Bug] Protobuf TypeError in Pinecone Python client v4.0.0 #336

Closed umnooob closed 4 months ago

umnooob commented 4 months ago

Is this a new bug in the Pinecone Python client?

Current Behavior

When i use the newest Pinecone Python client 4.0, it comes to me:

TypeError                                 Traceback (most recent call last)
[<ipython-input-5-167212c9168a>](https://localhost:8080/#) in <cell line: 6>()
      4 from llama_index.embeddings.openai import OpenAIEmbedding
      5 from llama_index.core.ingestion import IngestionPipeline
----> 6 from pinecone.grpc import PineconeGRPC
      7 from pinecone import ServerlessSpec
      8 

5 frames
[/usr/local/lib/python3.10/dist-packages/pinecone/grpc/__init__.py](https://localhost:8080/#) in <module>
     45 """
     46 
---> 47 from .index_grpc import GRPCIndex
     48 from .pinecone import PineconeGRPC
     49 from .config import GRPCClientConfig

[/usr/local/lib/python3.10/dist-packages/pinecone/grpc/index_grpc.py](https://localhost:8080/#) in <module>
      7 
      8 from .utils import dict_to_proto_struct, parse_fetch_response, parse_query_response, parse_stats_response
----> 9 from .vector_factory_grpc import VectorFactoryGRPC
     10 
     11 from pinecone.core.client.models import (

[/usr/local/lib/python3.10/dist-packages/pinecone/grpc/vector_factory_grpc.py](https://localhost:8080/#) in <module>
     15     MetadataDictionaryExpectedError
     16 )
---> 17 from .sparse_values_factory import SparseValuesFactory
     18 
     19 from pinecone.core.grpc.protos.vector_service_pb2 import (

[/usr/local/lib/python3.10/dist-packages/pinecone/grpc/sparse_values_factory.py](https://localhost:8080/#) in <module>
     12 ) 
     13 
---> 14 from pinecone.core.grpc.protos.vector_service_pb2 import (
     15     SparseValues as GRPCSparseValues,
     16 )

[/usr/local/lib/python3.10/dist-packages/pinecone/core/grpc/protos/vector_service_pb2.py](https://localhost:8080/#) in <module>
     39   create_key=_descriptor._internal_create_key,
     40   fields=[
---> 41     _descriptor.FieldDescriptor(
     42       name='indices', full_name='SparseValues.indices', index=0,
     43       number=1, type=13, cpp_type=3, label=3,

[/usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py](https://localhost:8080/#) in __new__(cls, name, full_name, index, number, type, cpp_type, label, default_value, message_type, enum_type, containing_type, is_extension, extension_scope, options, serialized_options, has_default_value, containing_oneof, json_name, file, create_key)
    551                 has_default_value=True, containing_oneof=None, json_name=None,
    552                 file=None, create_key=None):  # pylint: disable=redefined-builtin
--> 553       _message.Message._CheckCalledFromGeneratedFile()
    554       if is_extension:
    555         return _message.default_pool.FindExtensionByName(full_name)

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Expected Behavior

when i use Pinecone Python client == 3.2.2, everything ok for me. I believe this pr https://github.com/pinecone-io/pinecone-python-client/pull/334 should have introduced bugs

Steps To Reproduce

pip install pinecone-client[grpc]==4.0.0

import os

from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.ingestion import IngestionPipeline
from pinecone.grpc import PineconeGRPC
from pinecone import ServerlessSpec

from llama_index.vector_stores.pinecone import PineconeVectorStore

def emb_and_upsert(documnets,openai_key,pinecone_api_key=None,index_name=None,namespace=None):
  vector_store = None
  # Initialize connection to Pinecone
  if pinecone_api_key:
    pc = PineconeGRPC(api_key=pinecone_api_key)

    # Initialize your index
    pinecone_index = pc.Index(index_name)

    # Initialize VectorStore
    vector_store = PineconeVectorStore(pinecone_index=pinecone_index,namespace=namespace)

  # This will be the model we use both for Node parsing and for vectorization
  embed_model = OpenAIEmbedding(api_key=openai_key)

  # Define the pipeline
  pipeline = IngestionPipeline(
      transformations=[
          SemanticSplitterNodeParser(
              buffer_size=1,
              breakpoint_percentile_threshold=95,
              embed_model=embed_model,
              ),
          embed_model,
          ],
      vector_store=vector_store
      )
  nodes = pipeline.run(documents=process_raw(documnets)) # upsert once
  return nodes

nodes = emb_and_upsert(documents,openai_key,pinecone_api_key=pinecone_api_key,index_name = "xxx",namespace="xxx")

Relevant log output

No response

Environment

- Python:3.10.12
- pinecone:4.0.0

Additional Context

No response

daverigby commented 4 months ago

Thanks @umnooob for raising this issue.

Looking at your backtrace I see the following code fragment from vector_service_pb2.py

[/usr/local/lib/python3.10/dist-packages/pinecone/core/grpc/protos/vector_service_pb2.py](https://localhost:8080/#) in <module>
     39   create_key=_descriptor._internal_create_key,
     40   fields=[
---> 41     _descriptor.FieldDescriptor(
     42       name='indices', full_name='SparseValues.indices', index=0,
     43       number=1, type=13, cpp_type=3, label=3,

However, we can see that lines 39-43 do not look like this in v4.0.0 of pinecone-client: https://github.com/pinecone-io/pinecone-python-client/blob/06c69fbbe5c3fa57717ba71596d94f03ee50aaa3/pinecone/core/grpc/protos/vector_service_pb2.py#L39-L43

Indeed, the lines in your backtrace match the v3.2.2 code: https://github.com/pinecone-io/pinecone-python-client/blob/c5ba9ce0adb31cde7a7b779d0bdfaaf186f596ec/pinecone/core/grpc/protos/vector_service_pb2.py#L39-L43

I believe there is an issue with your Python environment and v4.0.0 has not been installed correctly - I suggest you start with a clean environment (for example using venv) and reinstall your packages.

Indeed, if I attempt to reproduce the issue you see I can't even get pip to install pinecone-client v4.0.0 and the necessary LlamaIndex packages as they do not support v4.0.0 yet:

❯ python3.10 -m venv .venv
❯ source .venv/bin/activate
❯ pip install "pinecone-client[grpc]==4.0.0"  llama-index-vector-stores-pinecone
Collecting pinecone-client==4.0.0 (from pinecone-client[grpc]==4.0.0)
  Using cached pinecone_client-4.0.0-py3-none-any.whl.metadata (16 kB)
Collecting llama-index-vector-stores-pinecone
  Using cached llama_index_vector_stores_pinecone-0.1.6-py3-none-any.whl.metadata (674 bytes)
...
INFO: pip is looking at multiple versions of llama-index-vector-stores-pinecone to determine which version is compatible with other requirements. This could take a while.
Collecting llama-index-vector-stores-pinecone
...
ERROR: Cannot install llama-index-vector-stores-pinecone==0.0.1, llama-index-vector-stores-pinecone==0.1.0, llama-index-vector-stores-pinecone==0.1.1, llama-index-vector-stores-pinecone==0.1.2, llama-index-vector-stores-pinecone==0.1.3, llama-index-vector-stores-pinecone==0.1.4, llama-index-vector-stores-pinecone==0.1.5, llama-index-vector-stores-pinecone==0.1.6, pinecone-client==4.0.0 and pinecone-client[grpc]==4.0.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested pinecone-client==4.0.0
    pinecone-client[grpc] 4.0.0 depends on pinecone-client 4.0.0 (from https://files.pythonhosted.org/packages/9c/64/081b55a33e492fc181524a955c2b65ba8a628dbc1bb897e65b723c7b7ffc/pinecone_client-4.0.0-py3-none-any.whl (from https://pypi.org/simple/pinecone-client/) (requires-python:<4.0,>=3.8))
    llama-index-vector-stores-pinecone 0.1.6 depends on pinecone-client<4.0.0 and >=3.0.2