Open Tizzzzy opened 3 months ago
Hi, When I am trying to run this code:
from llmsherpa.readers import LayoutPDFReader # llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all" llmsherpa_api_url = "http://localhost:5001/api/parseDocument?renderFormat=all&useNewIndentParser=true" pdf_url = "C:/Users/super/OneDrive/Desktop/....pdf" pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)
If I set pdf_url to my local path, it will give me this error:
pdf_url
--------------------------------------------------------------------------- LocationValueError Traceback (most recent call last) Cell In[35], [line 7](vscode-notebook-cell:?execution_count=35&line=7) [5](vscode-notebook-cell:?execution_count=35&line=5) pdf_url = "C:/Users/super/OneDrive/Desktop/vertisim_ai/BidSmart/code/context_aware_parse/nlm-ingestor/qt625910hd_noSplash_7a7f8e7e4ab806cd0a32fe4adde0cf28.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf [6](vscode-notebook-cell:?execution_count=35&line=6) pdf_reader = LayoutPDFReader(llmsherpa_api_url) ----> [7](vscode-notebook-cell:?execution_count=35&line=7) doc = pdf_reader.read_pdf(pdf_url) File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\llmsherpa\readers\file_reader.py:65, in LayoutPDFReader.read_pdf(self, path_or_url, contents) [63](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:63) is_url = urlparse(path_or_url).scheme != "" [64](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:64) if is_url: ---> [65](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:65) pdf_file = self._download_pdf(path_or_url) [66](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:66) else: [67](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:67) file_name = os.path.basename(path_or_url) File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\llmsherpa\readers\file_reader.py:36, in LayoutPDFReader._download_pdf(self, pdf_url) [34](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:34) # add authorization headers if using external API (see upload_pdf for an example) [35](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:35) download_headers = {"User-Agent": user_agent} ---> [36](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:36) download_response = self.download_connection.request("GET", pdf_url, headers=download_headers) [37](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:37) file_name = os.path.basename(urlparse(pdf_url).path) [38](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/llmsherpa/readers/file_reader.py:38) # note you can change the file name here if you'd like to something else File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\urllib3\request.py:74, in RequestMethods.request(self, method, url, fields, headers, **urlopen_kw) [71](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:71) urlopen_kw["request_url"] = url [73](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:73) if method in self._encode_url_methods: ---> [74](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:74) return self.request_encode_url( [75](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:75) method, url, fields=fields, headers=headers, **urlopen_kw [76](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:76) ) [77](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:77) else: [78](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:78) return self.request_encode_body( [79](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:79) method, url, fields=fields, headers=headers, **urlopen_kw [80](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:80) ) File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\urllib3\request.py:96, in RequestMethods.request_encode_url(self, method, url, fields, headers, **urlopen_kw) [93](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:93) if fields: [94](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:94) url += "?" + urlencode(fields) ---> [96](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/request.py:96) return self.urlopen(method, url, **extra_kw) File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\urllib3\poolmanager.py:364, in PoolManager.urlopen(self, method, url, redirect, **kw) [361](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:361) u = parse_url(url) [362](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:362) self._validate_proxy_scheme_url_selection(u.scheme) --> [364](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:364) conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme) [366](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:366) kw["assert_same_host"] = False [367](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:367) kw["redirect"] = False File c:\Users\super\anaconda3\envs\nlm-ingestor\lib\site-packages\urllib3\poolmanager.py:236, in PoolManager.connection_from_host(self, host, port, scheme, pool_kwargs) [225](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:225) """ [226](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:226) Get a :class:`urllib3.connectionpool.ConnectionPool` based on the host, port, and scheme. [227](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:227) (...) [232](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:232) needed. [233](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:233) """ [235](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:235) if not host: --> [236](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:236) raise LocationValueError("No host specified.") [238](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:238) request_context = self._merge_pool_kwargs(pool_kwargs) [239](file:///C:/Users/super/anaconda3/envs/nlm-ingestor/lib/site-packages/urllib3/poolmanager.py:239) request_context["scheme"] = scheme or "http" LocationValueError: No host specified.
Having the same issue
Same issue!! Do you have any one with the solution?
use the relative pdf path
Hi, When I am trying to run this code:
If I set
pdf_url
to my local path, it will give me this error: