Closed jkndrkn closed 5 months ago
๐ค
Hey there @jkndrkn! ๐ Great to see you diving into the depths of LangChain again. Looks like you've hit a snag, but don't worry, we'll get to the bottom of it together!
The error you're encountering, NameResolutionError
, is typically related to DNS resolution issues. This error suggests that the Python interpreter is unable to resolve the hostname answers-dev-jde-test-REDACTED.svc.us-east-1-aws.pinecone.io
to an IP address.
Here are a few things you can try to resolve this issue:
Check your internet connection: Ensure that your internet connection is stable and you're able to access other websites or services.
Check DNS settings: If you're on a corporate network, there might be specific DNS settings or proxies that you need to use. You might need to consult with your network administrator for this.
Try a different network: If possible, try to run your code on a different network. This can help determine if the issue is related to your current network's configuration.
Check the Pinecone service status: It's also possible that the Pinecone service is temporarily unavailable. You can check the status of Pinecone's services or reach out to their support for more information.
Update your dependencies: Although you're using the latest version of LangChain, you might want to check if there are updates available for other dependencies, especially those related to network connections and HTTP requests (like requests
or urllib3
).
Unfortunately, I can't access the repository you've linked to, so I can't review your code or the full error traceback. If none of the above suggestions work, could you please provide more details about your environment? Specifically, information about your operating system, Python version, and network configuration could be helpful.
Also, as mentioned in the context, the langchain_community.vectorstores.Pinecone
class is deprecated. It's recommended to use langchain_pinecone.Pinecone
instead. If possible, try updating your code to use the new class and see if the issue persists.
For more information about the langchain_pinecone.Pinecone
class, you can refer to the LangChain documentation. If you need help updating your code, feel free to ask.
I hope this helps! Let me know if you have any other questions or if there's anything else I can do to assist you.
Thank you for your reply, @dosubot
The NameResolutionError
arises after as many as 28 successful upsert operations. I can verify on my Pinecone console that vectors are being upserted correctly. However, after ~28 upsert operations the NameResolutionError
is thrown. This indicates to me that there are likely no network issues.
I have updated the sample code in the pinecone-upsert-error repository to use langchain_pinecone.Pinecone
and I see the same error. Please see the below stack trace.
Additional environment information:
Traceback (most recent call last):
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect
self.sock = sock = self._new_conn()
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f7c928f6530>: Failed to resolve 'answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/jeriksen/bamboohr/ai-labs/pinecone-upsert-error/load_document.py", line 23, in <module>
PineconeStore.from_documents(docs, embedder, index_name=INDEX_NAME)
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 508, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 434, in from_texts
pinecone.add_texts(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in add_texts
[res.get() for res in async_res]
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/langchain_pinecone/vectorstores.py", line 166, in <listcomp>
[res.get() for res in async_res]
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 195, in __call_api
response_data = self.request(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/api_client.py", line 454, in request
return self.rest_client.POST(url,
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 301, in POST
return self.request("POST", url,
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/pinecone/core/client/rest.py", line 178, in request
r = self.pool_manager.request(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 144, in request
return self.request_encode_body(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/_request_methods.py", line 279, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/poolmanager.py", line 444, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
return self.urlopen(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
return self.urlopen(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 877, in urlopen
return self.urlopen(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
File "/Users/jeriksen/opt/anaconda3/envs/pinecone-upsert-error/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io', port=443): Max retries exceeded with url: /vectors/upsert (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f7c928f6530>: Failed to resolve 'answers-dev-jde-test-768-REDACTED.svc.us-east-1-aws.pinecone.io' ([Errno 8] nodename nor servname provided, or not known)"))
๐ค
Based on the information you've provided, it seems like the issue you're experiencing is related to network connectivity rather than a limitation of the langchain_community.vectorstores.Pinecone
class. The NameResolutionError
you're seeing is typically associated with DNS resolution issues, which could be due to a variety of factors such as network conditions, DNS server issues, or issues with the Pinecone service itself.
The langchain_community.vectorstores.Pinecone
class does not explicitly limit the number of upsert operations it can handle in a single session. The add_texts
method, which is responsible for upserting data into the Pinecone vector store, uses a batch approach to manage memory usage and optimize performance when using HTTP-based embeddings. This method divides the input data into chunks of size embedding_chunk_size
(default value is 1000) and then further divides these chunks into batches of size batch_size
for upserting. The async_req=True
parameter is used to perform the upsert operations asynchronously, which can help with performance.
However, the actual number of upsert operations that can be handled in a single session may be influenced by factors such as the available memory, the size of the data being upserted, the performance of the Pinecone service, and network conditions.
Given this, I would recommend the following steps to troubleshoot the issue:
Check your network conditions: Since the error is related to DNS resolution, it might be worth checking if there are any issues with your network. You could try using a different network or resetting your router to see if that resolves the issue.
Check the Pinecone service status: It's also possible that there might be issues with the Pinecone service itself. You could check the status of the Pinecone service to see if there are any ongoing issues.
Contact Pinecone support: If the issue persists, it might be worth reaching out to Pinecone support. They might be able to provide more insight into the issue and help you resolve it.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Regarding the full error stack trace and system information you mentioned in the pinecone-upsert-error repository, I'm sorry but I'm unable to view URLs or images. If you could provide the relevant information here, it would be very helpful in further diagnosing the issue.
For more information, you can refer to the source code of the langchain_community.vectorstores.Pinecone
class here.
-Dosu
Thank you. I tried changing the DNS servers on my system to use the Google DNS servers for both IPv4 and IPv6:
8.8.8.8 8.8.4.4 2001:4860:4860::8888 2001:4860:4860::8844
I also tried resetting my router. Neither approach helped.
As of 4:57PM EST the Pinecone status page says "All Systems Operational".
I continue to experience the same error.
I have reached out to Pinecone support but they have not been able to find a solution so far.
๐ค
It seems like this issue requires a deeper look, @baskaryan could you please lend your expertise?
If you use the new langchain-pinecone
package instead, do you see the same error?
pip install langchain-pinecone
from langchain_pinecone import Pinecone
Based on the error being a retry error, I would guess you're either getting rate limited by pinecone or overloading the instance you have with them.
Hi @efriis, thank you so much for your help.
I did already try installing langchain-pinecone
. I currently have version 0.0.2 installed.
I updated the pinecone-upsert-error repository to show that I am using that dependency instead. With langchain-pinecone
I continue to see the error.
Based on the error being a retry error, I would guess you're either getting rate limited by pinecone or overloading the instance you have with them.
If I reduce the size an upsert call to a single document per call from one hundred and call time.sleep(5)
before every request as in the below code I continue to see the error.
Here is the modified source code:
from os import environ
from time import sleep
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_pinecone import Pinecone as PineconeStore
INDEX_NAME = environ["PINECONE_INDEX"]
TRIALS = 50
TEXT_PATH = "my_text.txt"
loader = TextLoader(TEXT_PATH)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)
print("docs length:", len(docs))
embedder = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2")
for i in range(0, TRIALS):
print("trial: ", i, flush=True)
sleep(5)
PineconeStore.from_documents(docs[:1], embedder, index_name=INDEX_NAME)
This code throws the NameResolutionError
after 47 trials. It sends only a single vector per request every five seconds. It seems to me like rate limiting shouldn't happen with such a light load. My colleague on a different machine and network is able to run the same code with no issues.
Is it possible that the Pinecone connection somehow becomes stale on my machine and needs to be re-established periodically?
I believe that I resolved the issue.
I was running the above sample code on a different machine and was seeing NameResolutionError
. While experimenting with ExpressVPN to make my traffic to Pinecone appear to be coming from different networks I ran into this error:
Traceback (most recent call last):
File "/Users/jderiksen/src/pinecone-upsert-error/load_document.py", line 23, in <module>
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 508, in from_documents
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 434, in from_texts
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 155, in add_texts
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/langchain_pinecone/vectorstores.py", line 156, in <listcomp>
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/utils/error_handling.py", line 10, in inner_func
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/data/index.py", line 171, in upsert
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/data/index.py", line 192, in _upsert_batch
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 771, in __call__
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api/vector_operations_api.py", line 951, in __upsert
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 833, in call_with_http_info
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 416, in call_api
File "/Users/jderiksen/miniconda3/lib/python3.11/site-packages/pinecone/core/client/api_client.py", line 102, in pool
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/pool.py", line 930, in __init__
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/pool.py", line 196, in __init__
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/context.py", line 113, in SimpleQueue
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/queues.py", line 346, in __init__
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/context.py", line 68, in Lock
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/synchronize.py", line 167, in __init__
File "/Users/jderiksen/miniconda3/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
OSError: [Errno 24] Too many open files
I checked how many file descriptors were available on my machine with ulimit -n
and the result was reported as 256.
After increasing the limit using ulimit -n 2048
both the sample code and my actual production code stopped failing with NameResolutionError
.
Thanks so much for being willing to help with this!
Checked other resources
Example Code
The following code is copied from a small repository I created that helps reproduce the issue.
Please see the project README.md for instructions for how to configure and run this code.
Error Message and Stack Trace (if applicable)
Description
I am trying to use from
langchain_community.vectorstores.Pinecone
to upsert embeddings to a Pinecone index usingPinecone.from_documents()
. When I start a script that callsfrom_documents()
once for each document it will succeed for the first few runs but then fail and return this error:This is an unexpected error. The index does exist. I have tried this with many different Pinecone indexes of both 1024 dimensions with Cohere embeddings and 768 dimensions with HuggingFace embeddings. I have also tried various document sizes.
System Info
LangChain dependencies
Here is my environment.yml
Here is the entire output of
pip freeze
: