Open aman-vink opened 3 months ago
@aman-vink, Please pull from the main branch and let me know if the issue is still observed.
Same issue for me on a long pdf (>200 pages)
Hello @kiran-nlmatics I am facing the same issue and just did pip install nlm-ingestor + LLM Sherpa docker server.
EDIT: Here is the complete error
KeyError Traceback (most recent call last) Cell In[21], line 6 4 llmsherpa_api_url = llmsherpa_api_url + "&applyOcr=yes" 5 pdf_reader = LayoutPDFReader(llmsherpa_api_url) ----> 6 doc = pdf_reader.read_pdf(pdf_url)
File ~/miniconda3/envs/mariscal-env-310/lib/python3.10/site-packages/llmsherpa/readers/file_reader.py:73, in LayoutPDFReader.read_pdf(self, path_or_url, contents) 71 parser_response = self._parse_pdf(pdf_file) 72 response_json = json.loads(parser_response.data.decode("utf-8")) ---> 73 blocks = response_json['return_dict']['result']['blocks'] 74 return Document(blocks)
KeyError: 'return_dict'
File "/Users/tpmpraka/miniconda3/envs/grm/lib/python3.11/site-packages/llmsherpa/readers/file_reader.py", line 74, in read_pdf blocks = response_json['return_dict']['result']['blocks']
KeyError: 'return_dict'
response JSON was {'reason': "'style'", 'status': 'fail'}
KeyError Traceback (most recent call last) in <cell line: 15>()
13 llmsherpa_api_url = llmsherpa_api_url + "&applyOcr=yes"
14 pdf_reader = LayoutPDFReader(llmsherpa_api_url)
---> 15 doc = pdf_reader.read_pdf(pdf_url)
/usr/local/lib/python3.10/dist-packages/llmsherpa/readers/file_reader.py in read_pdf(self, path_or_url, contents) 71 parser_response = self._parse_pdf(pdf_file) 72 response_json = json.loads(parser_response.data.decode("utf-8")) ---> 73 blocks = response_json['return_dict']['result']['blocks'] 74 return Document(blocks)
KeyError: 'return_dict'
https://s201.q4cdn.com/262069030/files/doc_financials/2023/ar/Walmart-10K-Reports-Optimized.pdf
For this url