pinecone-io / pinecone-python-client

The Pinecone Python client
https://www.pinecone.io/docs
Apache License 2.0
308 stars 80 forks source link

pinecone.core.exceptions.PineconeException with simple examples #154

Open sergerdn opened 1 year ago

sergerdn commented 1 year ago

I have created a simple script from readme to demonstrate that almost simple example is not working on my end for unknown reasons. I have tested several examples with the same result. However, I tested a JavaScript version that worked perfectly on my end. But I need to work with Python. Using Python 3.10.10 with poetry on a Windows machine. I have spent two days trying to figure out what happened, but I have had no luck.

import logging
import os

logging.basicConfig(level=logging.DEBUG)
from dotenv import load_dotenv

load_dotenv()

import pinecone

def main():

    pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT")
    )

    index_name = "langchainjsfundamentals"
    print(pinecone.list_indexes())

    # ensure that index exists
    assert index_name in pinecone.list_indexes()

    index = pinecone.Index(index_name)  # or pinecone.GRPCIndex

    ########## ERROR IS HERE ########## 
    upsert_response = index.upsert(
        vectors=[
            ("vec1", [0.1, 0.2, 0.3, 0.4], {"genre": "drama"}),
            ("vec2", [0.2, 0.3, 0.4, 0.5], {"genre": "action"}),
        ],
        namespace="example-namespace"
    )
   ###################################### 

    print(upsert_response)

if __name__ == '__main__':
    main()
[tool.poetry.dependencies]
python = "^3.10"
click = "^8.1.3"
langchain = "^0.0.123"
python-dotenv = "^1.0.0"
pinecone-client = {extras = ["grpc"], version = "2.2.1"}
openai = "^0.27.2"
pypdf = "^3.7.0"
chromadb = "^0.3.13"
datasets = "^2.10.1"
....
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "\example.py", line 39, in <module>
    main()
  File "example.py", line 27, in main
    upsert_response = index.upsert(
  File "\site-packages\pinecone\core\utils\error_handling.py", line 25, in inner_func
    raise PineconeProtocolError(f'Failed to connect; did you specify the correct index name?') from e
pinecone.core.exceptions.PineconeProtocolError: Failed to connect; did you specify the correct index name?
....
rajat08 commented 1 year ago

I would double-check the environment and the API key. Any index you create will follow the URL scheme : https://{index-name}-{project-id}.svc.{environment}.pinecone.io. This error is saying that it cannot connect to this index; this happens because either this URL does not exist (in cases where you either have the index name or env name wrong) or the API Key that is used to authenticate the connection to this URL is incorrect.

The API keys you use must be associated with the project and environment the index belongs to. If you are confident those bits of information are correct, please let us know. We can try to debug this further.

sergerdn commented 1 year ago

You are absolutely right, but part of my script is working correctly, including:


index_name = "langchainjsfundamentals"
# printed ["langchainjsfundamentals"]
print(pinecone.list_indexes()) 

# ensure that index exists: WORKS AS I EXPECTED, print my index name
assert index_name in pinecone.list_indexes()

Therefore, I believe that I provided the correct API key and index name.

If we had some end-to-end functional tests to check the connection to the server, I would be able to determine the issue. However, currently, we don't have any. I have only found a few outdated and basic tests.

It can be difficult to determine what the client should send and how the server should reply, which is why having integration/functional tests is crucial for developers.

sergerdn commented 1 year ago

I have figured out why I possibly got that error. The host was not generated properly. Instead of https://{index-name}-{project-id}.svc.{environment}.pinecone.io, it generated https://{index-name}-{index-name}.svc.{environment}.pinecone.io.

Code:

index = pinecone.Index(index_name)
print(index.describe_index_stats())

Capture

Capture

gdj0nes commented 1 year ago

We're happy you found a solution!

sergerdn commented 1 year ago

We're happy you found a solution!

I didn't claim to have found a solution. I wondered why I got this error, but I did say that I know how it can be fixed.

But now I have found a solution instead of using:

pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT")
    )

We should using:


 pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT"),
        project_name="PROJECT_ID", # SHOULD PROJECT ID!!!, 
)

I did not expect that all examples lies about real usage. I believe that it is a bug in library code, not in docs. Maybe it happened based on https://controller.us-west4-gcp.pinecone.io/actions/whoami replying as {"project_name":"PROJECT_ID","user_label":"default","user_name":"USERNAME_ID"} and it can be very confusing, because we have inconsistency with index-name and project-id.

So, I believe we have a bug.

Please note that the JavaScript version of the library is working as expected and as described in the documentation.

sergerdn commented 1 year ago

@gdj0nes

I will submit a pull request that doesn't fix the bug but helps you understand what's going on better. This will also assist other people in comprehending the issue. I will do it as soon as possible.

rajat08 commented 1 year ago

@sergerdn, thanks for posting more info. We don't mention passing project id in the docs because the client is supposed to infer it from your API key and environment parameter. We make this call in config.py to get this. Can you see if the response you get from calling pinecone.whoami() (after you call pinecone.init()) has the right project id? You can confirm the project id by looking at the index URL in the console.

(I am sure you have tried this but writing out so that others can follow it in the future)

sergerdn commented 1 year ago

Can you see if the response you get from calling pinecone.whoami() (after you call pinecone.init()) has the right project id?

Yes, I confirm that I have seen it in both cases.

WhoAmIResponse(username='18bd562', user_label='default', projectname='5d63542')
WhoAmIResponse(username='18bd562', user_label='default', projectname='5d63542')

import logging
import os

logging.basicConfig(level=logging.DEBUG)
from dotenv import load_dotenv

load_dotenv()

import pinecone

def example_with_project_name():
    pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT"),
        project_name="5d63542",  # SHOULD PROJECT ID!!!
    )
    print(pinecone.whoami())

def example_not_with_project_name():
    pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT"),
    )
    print(pinecone.whoami())

if __name__ == '__main__':
    example_with_project_name()
    example_not_with_project_name()
rajat08 commented 1 year ago

Can you see if the response you get from calling pinecone.whoami() (after you call pinecone.init()) has the right project id?

Yes, I confirm that I have seen it in both cases.

WhoAmIResponse(username='18bd562', user_label='default', projectname='5d63542')
WhoAmIResponse(username='18bd562', user_label='default', projectname='5d63542')
import logging
import os

logging.basicConfig(level=logging.DEBUG)
from dotenv import load_dotenv

load_dotenv()

import pinecone

def example_with_project_name():
    pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT"),
        project_name="5d63542",  # SHOULD PROJECT ID!!!
    )
    print(pinecone.whoami())

def example_not_with_project_name():
    pinecone.init(
        api_key=os.getenv("PINECONE_API_KEY"),
        environment=os.getenv("PINECONE_ENVIRONMENT"),
    )
    print(pinecone.whoami())

if __name__ == '__main__':
    example_with_project_name()
    example_not_with_project_name()

Thanks, the project_name and project_id mismatch is problematic, but we have a plan to phase it out. We have yet to do it because of some unfortunate naming mishaps(:)) in internal resources, but it'll be out soon.

As for your initial connection error, the URL should be generated correctly because whoami seems to give you the correct answer. The only source of error can be index name then, I'll try to find out if something else is up.

sergerdn commented 1 year ago

We don't mention passing project id in the docs because the client is supposed to infer it from your API key and environment parameter.

I agree with you that we don't need to change the documentation. However, I was specifically referring to the code, not the documentation. I believe that the naming convention used in the code is confusing, particularly in the case of the project_id variable being named as projectname.

I understand that it happened because the API returned that name, but that's precisely why I have opened this issue. None of the examples worked as written without delving deep into the library, and there's confusion between different naming conventions as well, all because of the naming convention used.

I believe that, at the very least, the comments in the library code should describe what we are getting from the API. This would make it easier for other developers to understand the naming convention used and reduce confusion.

rajat08 commented 1 year ago

We don't mention passing project id in the docs because the client is supposed to infer it from your API key and environment parameter.

I agree with you that we don't need to change the documentation. However, I was specifically referring to the code, not the documentation. I believe that the naming convention used in the code is confusing, particularly in the case of the project_id variable being named as projectname.

I understand that it happened because the API returned that name, but that's precisely why I have opened this issue. None of the examples worked as written without delving deep into the library, and there's confusion between different naming conventions as well, all because of the naming convention used.

I believe that, at the very least, the comments in the library code should describe what we are getting from the API. This would make it easier for other developers to understand the naming convention used and reduce confusion.

Noted, we'll update it. Appreciate your help :pray:

sergerdn commented 1 year ago

@rajat08

Using pytest-vcr (https://pytest-vcr.readthedocs.io/en/latest/) with https://docs.pytest.org/en/7.2.x/ for writing functional tests can help prevent bugs in a very efficient and effective way. pytest-vcr allows you to mock and record any response from the API on the fly, while pytest provides an excellent framework for writing and executing tests in Python.

Together, they can be very helpful in testing different scenarios without making actual API requests, saving both time and resources.

I believe that writing tests should be easy and very fun.😄

rajat08 commented 1 year ago

@rajat08

Using pytest-vcr (https://pytest-vcr.readthedocs.io/en/latest/) with https://docs.pytest.org/en/7.2.x/ for writing functional tests can help prevent bugs in a very efficient and effective way. pytest-vcr allows you to mock and record any response from the API on the fly, while pytest provides an excellent framework for writing and executing tests in Python.

Together, they can be very helpful in testing different scenarios without making actual API requests, saving both time and resources.

I believe that writing tests should be easy and very fun.😄

Thanks for the suggestion! Most of our tests for the client run in a separate private repo that orchestrates code generation but we'll add more tests around this

sergerdn commented 1 year ago

Most of our tests for the client run in a separate private repo

If a user is struggling to comprehend how something functions, they may require access to tests in order to gain understanding. However, if these tests are not readily available in the main repository, the only recourse may be to explore the library with a debugger for troubleshooting. Therefore, I believe it is very important to have tests in the main repository rather than keeping them private.