langchain-ai / langchain

๐Ÿฆœ๐Ÿ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.49k stars 14.12k forks source link

Error in langchain_community.vectorstores.Pinecone: Sending request to /vectors/upsert triggers NameResolutionError #17474

Closed jkndrkn closed 5 months ago

jkndrkn commented 5 months ago

Checked other resources

Example Code

The following code is copied from a small repository I created that helps reproduce the issue.

from os import environ

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Pinecone

INDEX_NAME = environ["PINECONE_INDEX"]
TRIALS = 50
TEXT_PATH = "my_text.txt"

loader = TextLoader(TEXT_PATH)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

print("docs length:", len(docs))

embedder = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")

for i in range(0, TRIALS):
    print("trial: ", i, flush=True)
    Pinecone.from_documents(docs, embedder, index_name=INDEX_NAME)

Please see the project README.md for instructions for how to configure and run this code.

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f7d20580430>: Failed to resolve 'answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jeriksen/bamboohr/ai-labs/pinecone-upsert-error/load_document.py", line 28, in <module>
    Pinecone.from_documents(docs, embedder, index_name=INDEX_NAME)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 508, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 434, in from_texts
    pinecone.add_texts(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in add_texts
    [res.get() for res in async_res]
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in <listcomp>
    [res.get() for res in async_res]
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 195, in __call_api
    response_data = self.request(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 454, in request
    return self.rest_client.POST(url,
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 301, in POST
    return self.request("POST", url,
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 178, in request
    r = self.pool_manager.request(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 144, in request
    return self.request_encode_body(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 279, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/poolmanager.py", line 444, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io', port=443): Max retries exceeded with url: /vectors/upsert (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f7d20580430>: Failed to resolve 'answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)"))

Description

I am trying to use from langchain_community.vectorstores.Pinecone to upsert embeddings to a Pinecone index using Pinecone.from_documents(). When I start a script that calls from_documents() once for each document it will succeed for the first few runs but then fail and return this error:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io', port=443): Max retries exceeded with url: /vectors/upsert (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f7d20580430>: Failed to resolve 'answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)"))

This is an unexpected error. The index does exist. I have tried this with many different Pinecone indexes of both 1024 dimensions with Cohere embeddings and 768 dimensions with HuggingFace embeddings. I have also tried various document sizes.

System Info

LangChain dependencies

langchain==0.1.6
langchain-community==0.0.19
langchain-core==0.1.22

Here is my environment.yml

name: pinecone-upsert-error
dependencies:
  - pip=23.1.2
  - python=3.10.13
  - pip:
    - cohere==4.46
    - langchain==0.1.6
    - pinecone-client[grpc]==3.0.2

Here is the entire output of pip freeze:

aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
backoff==2.2.1
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cohere==4.46
dataclasses-json==0.6.4
exceptiongroup==1.2.0
fastavro==1.9.3
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
googleapis-common-protos==1.62.0
greenlet==3.0.3
grpc-gateway-protoc-gen-openapiv2==0.1.0
grpcio==1.60.1
huggingface-hub==0.20.3
idna==3.6
importlib-metadata==6.11.0
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
langchain==0.1.6
langchain-community==0.0.19
langchain-core==0.1.22
langsmith==0.0.87
lz4==4.3.3
MarkupSafe==2.1.5
marshmallow==3.20.2
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
networkx==3.2.1
nltk==3.8.1
numpy==1.26.4
packaging==23.2
pillow==10.2.0
pinecone-client==3.0.2
protobuf==3.20.3
pydantic==2.6.1
pydantic_core==2.16.2
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
scikit-learn==1.4.0
scipy==1.12.0
sentence-transformers==2.3.1
sentencepiece==0.1.99
sniffio==1.3.0
SQLAlchemy==2.0.27
sympy==1.12
tenacity==8.2.3
threadpoolctl==3.2.0
tokenizers==0.15.2
torch==2.2.0
tqdm==4.66.2
transformers==4.37.2
typing-inspect==0.9.0
typing_extensions==4.9.0
urllib3==2.2.0
yarl==1.9.4
zipp==3.17.0
dosubot[bot] commented 5 months ago

๐Ÿค–

Hey there @jkndrkn! ๐Ÿ‘‹ Great to see you diving into the depths of LangChain again. Looks like you've hit a snag, but don't worry, we'll get to the bottom of it together!

The error you're encountering, NameResolutionError, is typically related to DNS resolution issues. This error suggests that the Python interpreter is unable to resolve the hostname answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io to an IP address.

Here are a few things you can try to resolve this issue:

  1. Check your internet connection: Ensure that your internet connection is stable and you're able to access other websites or services.

  2. Check DNS settings: If you're on a corporate network, there might be specific DNS settings or proxies that you need to use. You might need to consult with your network administrator for this.

  3. Try a different network: If possible, try to run your code on a different network. This can help determine if the issue is related to your current network's configuration.

  4. Check the Pinecone service status: It's also possible that the Pinecone service is temporarily unavailable. You can check the status of Pinecone's services or reach out to their support for more information.

  5. Update your dependencies: Although you're using the latest version of LangChain, you might want to check if there are updates available for other dependencies, especially those related to network connections and HTTP requests (like requests or urllib3).

Unfortunately, I can't access the repository you've linked to, so I can't review your code or the full error traceback. If none of the above suggestions work, could you please provide more details about your environment? Specifically, information about your operating system, Python version, and network configuration could be helpful.

Also, as mentioned in the context, the langchain_community.vectorstores.Pinecone class is deprecated. It's recommended to use langchain_pinecone.Pinecone instead. If possible, try updating your code to use the new class and see if the issue persists.

For more information about the langchain_pinecone.Pinecone class, you can refer to the LangChain documentation. If you need help updating your code, feel free to ask.

I hope this helps! Let me know if you have any other questions or if there's anything else I can do to assist you.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
jkndrkn commented 5 months ago

Thank you for your reply, @dosubot

The NameResolutionError arises after as many as 28 successful upsert operations. I can verify on my Pinecone console that vectors are being upserted correctly. However, after ~28 upsert operations the NameResolutionError is thrown. This indicates to me that there are likely no network issues.

I have updated the sample code in the pinecone-upsert-error repository to use langchain_pinecone.Pinecone and I see the same error. Please see the below stack trace.

Additional environment information:

Traceback (most recent call last):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f7c928f6530>: Failed to resolve 'answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jeriksen/bamboohr/ai-labs/pinecone-upsert-error/load_document.py", line 23, in <module>
    PineconeStore.from_documents(docs, embedder, index_name=INDEX_NAME)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 508, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 434, in from_texts
    pinecone.add_texts(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in add_texts
    [res.get() for res in async_res]
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in <listcomp>
    [res.get() for res in async_res]
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 195, in __call_api
    response_data = self.request(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 454, in request
    return self.rest_client.POST(url,
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 301, in POST
    return self.request("POST", url,
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 178, in request
    r = self.pool_manager.request(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 144, in request
    return self.request_encode_body(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 279, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/poolmanager.py", line 444, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
    return self.urlopen(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io', port=443): Max retries exceeded with url: /vectors/upsert (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f7c928f6530>: Failed to resolve 'answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)"))
dosubot[bot] commented 5 months ago

๐Ÿค–

Based on the information you've provided, it seems like the issue you're experiencing is related to network connectivity rather than a limitation of the langchain_community.vectorstores.Pinecone class. The NameResolutionError you're seeing is typically associated with DNS resolution issues, which could be due to a variety of factors such as network conditions, DNS server issues, or issues with the Pinecone service itself.

The langchain_community.vectorstores.Pinecone class does not explicitly limit the number of upsert operations it can handle in a single session. The add_texts method, which is responsible for upserting data into the Pinecone vector store, uses a batch approach to manage memory usage and optimize performance when using HTTP-based embeddings. This method divides the input data into chunks of size embedding_chunk_size (default value is 1000) and then further divides these chunks into batches of size batch_size for upserting. The async_req=True parameter is used to perform the upsert operations asynchronously, which can help with performance.

However, the actual number of upsert operations that can be handled in a single session may be influenced by factors such as the available memory, the size of the data being upserted, the performance of the Pinecone service, and network conditions.

Given this, I would recommend the following steps to troubleshoot the issue:

  1. Check your network conditions: Since the error is related to DNS resolution, it might be worth checking if there are any issues with your network. You could try using a different network or resetting your router to see if that resolves the issue.

  2. Check the Pinecone service status: It's also possible that there might be issues with the Pinecone service itself. You could check the status of the Pinecone service to see if there are any ongoing issues.

  3. Contact Pinecone support: If the issue persists, it might be worth reaching out to Pinecone support. They might be able to provide more insight into the issue and help you resolve it.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Regarding the full error stack trace and system information you mentioned in the pinecone-upsert-error repository, I'm sorry but I'm unable to view URLs or images. If you could provide the relevant information here, it would be very helpful in further diagnosing the issue.

For more information, you can refer to the source code of the langchain_community.vectorstores.Pinecone class here.

-Dosu

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
jkndrkn commented 5 months ago

Thank you. I tried changing the DNS servers on my system to use the Google DNS servers for both IPv4 and IPv6:

8.8.8.8 8.8.4.4 2001:4860:4860::8888 2001:4860:4860::8844

I also tried resetting my router. Neither approach helped.

As of 4:57PM EST the Pinecone status page says "All Systems Operational".

I continue to experience the same error.

I have reached out to Pinecone support but they have not been able to find a solution so far.

dosubot[bot] commented 5 months ago

๐Ÿค–

It seems like this issue requires a deeper look, @baskaryan could you please lend your expertise?

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
efriis commented 5 months ago

If you use the new langchain-pinecone package instead, do you see the same error?

pip install langchain-pinecone

from langchain_pinecone import Pinecone

Based on the error being a retry error, I would guess you're either getting rate limited by pinecone or overloading the instance you have with them.

https://docs.pinecone.io/v1/docs/limits#upserts

jkndrkn commented 5 months ago

Hi @efriis, thank you so much for your help.

I did already try installing langchain-pinecone. I currently have version 0.0.2 installed.

I updated the pinecone-upsert-error repository to show that I am using that dependency instead. With langchain-pinecone I continue to see the error.

Based on the error being a retry error, I would guess you're either getting rate limited by pinecone or overloading the instance you have with them.

If I reduce the size an upsert call to a single document per call from one hundred and call time.sleep(5) before every request as in the below code I continue to see the error.

Here is the modified source code:

from os import environ
from time import sleep

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_pinecone import Pinecone as PineconeStore

INDEX_NAME = environ["PINECONE_INDEX"]
TRIALS = 50
TEXT_PATH = "my_text.txt"

loader = TextLoader(TEXT_PATH)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

print("docs length:", len(docs))

embedder = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")

for i in range(0, TRIALS):
    print("trial: ", i, flush=True)
    sleep(5)
    PineconeStore.from_documents(docs[:1], embedder, index_name=INDEX_NAME)

This code throws the NameResolutionError after 47 trials. It sends only a single vector per request every five seconds. It seems to me like rate limiting shouldn't happen with such a light load. My colleague on a different machine and network is able to run the same code with no issues.

Is it possible that the Pinecone connection somehow becomes stale on my machine and needs to be re-established periodically?

jkndrkn commented 5 months ago

I believe that I resolved the issue.

I was running the above sample code on a different machine and was seeing NameResolutionError. While experimenting with ExpressVPN to make my traffic to Pinecone appear to be coming from different networks I ran into this error:

Traceback (most recent call last):
  File "/Users/jderiksen/src/pinecone-upsert-error/load_document.py", line 23, in <module>
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 508, in from_documents
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 434, in from_texts
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 155, in add_texts
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 156, in <listcomp>
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/utils/error_handling.py", line 10, in inner_func
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/data/index.py", line 171, in upsert
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/data/index.py", line 192, in _upsert_batch
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 771, in __call__
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api/vector_operations_api.py", line 951, in __upsert
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 833, in call_with_http_info
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 416, in call_api
  File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 102, in pool
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/pool.py", line 930, in __init__
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/pool.py", line 196, in __init__
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/context.py", line 113, in SimpleQueue
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/queues.py", line 346, in __init__
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/context.py", line 68, in Lock
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/synchronize.py", line 167, in __init__
  File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
OSError: [Errno 24] Too many open files

I checked how many file descriptors were available on my machine with ulimit -n and the result was reported as 256.

After increasing the limit using ulimit -n 2048 both the sample code and my actual production code stopped failing with NameResolutionError.

Thanks so much for being willing to help with this!