Open RGalkin opened 4 months ago
Привет, @RGalkin! Я здесь, чтобы помочь тебе с этой проблемой 🤖
You might be encountering this connection error because the llmsherpa
API URL might be incorrect or the service might be down. Ensure that the llmsherpa_api_url
is correct and that the service is accessible.
Here is the example usage provided in the documentation:
from llama_index.readers.smart_pdf_loader import SmartPDFLoader
llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)
Make sure that the llmsherpa_api_url
is correct and that you can access it from your network. If the URL or the service is not reachable, you will encounter a connection error.
Additionally, ensure that the pdf_path_or_url
parameter is correctly specified as either a URL or a file path. The SmartPDFLoader
class uses the LayoutPDFReader
from llmsherpa.readers
to read and chunk the PDF, so any issues with the URL or file path can also lead to connection errors [1][2].
To continue talking to Dosu, mention @dosu.
I thought maybe something in my code was interfering with the connection so I ran the code in colab, still getting the same error.
The connection error you're encountering is likely due to issues with the llmsherpa
API service itself or network-related problems. Here are a few steps you can take to troubleshoot and resolve the issue:
Check API Service Status: Ensure that the llmsherpa
API service is up and running. Sometimes, the service might be temporarily down for maintenance or other reasons.
Network Configuration: Verify that there are no network issues or firewall settings blocking the connection to https://readers.llmsherpa.com
. You can try accessing the URL directly in a web browser to see if it is reachable.
Rate Limiting: Be aware that the llmsherpa
API has rate limiting in place. If you exceed the allowed number of requests per second, your IP might be temporarily banned. Ensure you are not making too many requests in a short period [1].
Proxy Settings: If you are behind a proxy, ensure that your proxy settings are correctly configured in your environment. You might need to set the proxy settings in your code as well.
Retry Mechanism: Implement a retry mechanism in your code to handle transient network issues. This can help in cases where the connection fails intermittently.
Here is an example of how you can implement a simple retry mechanism:
import time
from llama_index.readers.smart_pdf_loader import SmartPDFLoader
llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
def load_pdf_with_retries(pdf_url, retries=3, delay=5):
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
for attempt in range(retries):
try:
documents = pdf_loader.load_data(pdf_url)
return documents
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < retries - 1:
time.sleep(delay)
else:
raise
documents = load_pdf_with_retries(pdf_url)
This code will retry the connection up to three times with a delay of five seconds between attempts.
If the issue persists, it might be worth reaching out to the maintainers of the llmsherpa
API to check if there are any known issues or additional configurations required.
To continue talking to Dosu, mention @dosu.
The listener/server seems to be down. MaxRetryError: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7e835823e920>, 'Connection to readers.llmsherpa.com timed out. (connect timeout=None)'))
Bug Description
I'm running the example from the website:
I'm getting the following error: `An exception occurred: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
The above exception was the direct cause of the following exception: raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))`
Version
llama-index 0.10.54 llama-index-readers-smart-pdf-loader 0.1.4
Steps to Reproduce
Relevant Logs/Tracbacks
No response