run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
3.23k stars 316 forks source link

Error Parsing Normal PDF File #314

Open marco-bertelli opened 4 months ago

marco-bertelli commented 4 months ago

Describe the bug I am trying to parse a normal pdf (no tables or strange things), but only in that document i get a non verbose error:

Started parsing the file under job_id d50bdaa7-a442-48c4-a710-41f2cbf8d50b Error while parsing the file './pdfs/59e90ce61d2d521ffc7c1fb2-application-d789fd00-2b4a-4db6-906a-2a62e8d22911.pdf': Failed to parse the file: [d50bdaa7-a442-48c4-a710-41f2cbf8d50b], status: ERROR.

i have checked that the pdf is not corrupted or something else, here the llamaparse code configuration:

parser = LlamaParse(
            result_type="markdown",
            verbose=True,
            invalidate_cache=True,
            do_not_cache=True,
        )

Files i cannot share the file for privacy info

Job ID d50bdaa7-a442-48c4-a710-41f2cbf8d50b

Client: Please remove untested options:

Additional context maybe is something releated to that file but from the logs i am unable to understand why.

Thanks in advice for the help

hexapode commented 4 months ago

It seems llamaParse have issue with your document and return no data in file error. You can check it using the endpoint:

https://api.cloud.llamaindex.ai/api/parsing/job/d50bdaa7-a442-48c4-a710-41f2cbf8d50b/details

with your api key as a Bearer Token

What happen when you try to copy paste content from your document?

marco-bertelli commented 3 months ago

thanks @hexapode tomorrow morning i will provide the log, thanks for the help provided

marco-bertelli commented 3 months ago

@hexapode all ok after some tests the error occurred only that day (maybe a cache problem?) i don't know but now works as expected thanks