nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.17k stars 117 forks source link

keyerror: result #7

Closed jalkestrup closed 8 months ago

jalkestrup commented 8 months ago

Running test script in colab:

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all" pdf_url = "dagpenge_LH_merged.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)

returns

KeyError Traceback (most recent call last) in <cell line: 6>() 4 pdf_url = "dagpenge_LH_merged.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf 5 pdf_reader = LayoutPDFReader(llmsherpa_api_url) ----> 6 doc = pdf_reader.read_pdf(pdf_url)

/usr/local/lib/python3.10/dist-packages/llmsherpa/readers/file_reader.py in read_pdf(self, path_or_url) 39 parser_response = self._parse_pdf(pdf_file) 40 response_json = json.loads(parser_response.data.decode("utf-8")) ---> 41 blocks = response_json['return_dict']['result']['blocks'] 42 return Document(blocks) 43 # def read_file(file_path):

KeyError: 'result'

I often get this error when trying to run the demo script. It also occurred yesterday, but then running it a few times "solved" the issue. It does not now.

image

ansukla commented 8 months ago

Hi @jalkestrup,

Is it possible to share the document link?

jalkestrup commented 8 months ago

Yes, I can upload it here. When I re-ran the same sell ~10 times, it finally suceeded. Had the same experience with the PDF link sent yesterday, this is however a longer merged PDF of many pages / you can download this pdf here: https://docdro.id/7Plq8CY

ansukla commented 8 months ago

This appears to be due to high server load. We are working on increasing the capacity and also offering a private hosted option soon. Please stay tuned.

PranavGupta98 commented 6 months ago

Hi I am facing the same error, I have 5 PDF files. The code works for 4 except one, I tried running it a few times like @jalkestrup but it still throws the error: KeyError: 'result'

I would really appreciate any support from authors/community on this!