run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
2.49k stars 249 forks source link

Wait long time but no result return #83

Open jzwilliams07 opened 6 months ago

jzwilliams07 commented 6 months ago

hi, I use the demo_api ipynb code on my colab. While I was trying to use llama parse to get text parsed from a pdf which has many unstructured contents like some figures and irregular text layout, I did not receive any response for a long time.The file size is 1.7MB not a big file at all. So I want to know that if the llama parse can not handle such a irregular text layout or there are some problems?

jzwilliams07 commented 6 months ago

test.pdf this is the relevant pdf file

hexapode commented 6 months ago

Hi!

The providing PDF is copy protected (try to copy paste from it). It is likely the cause of the issue.

Try to remove the protection and try again?

jzwilliams07 commented 6 months ago

Hi!

The providing PDF is copy protected (try to copy paste from it). It is likely the cause of the issue.

Try to remove the protection and try again?

Sorry to reply late, I will give it a try

charliem17 commented 6 months ago

I'd like to +1 this issue. My team and I are trying to upload/process documents but the status of them stays in the "PENDING" state for what seems like forever. I even tried uploading a pdf with only a few lines of text and it's been stuck pending for some hours now. I wish I could provide more helpful info than just this

elmstedt commented 6 months ago

@hexapode It's not a copy protection issue. The whole thing is bugged.

A simple PDF generated by,

echo "\documentclass{article}\begin{document}a\end{document}" | pdflatex -jobname=a

is stuck in PENDING status forever.

Edit: Adding,

I have tried both the Python library and using the raw API through cURL, both have the same result. All requests to check the status of the jobs return with a status of PENDING.

PDF file: a.pdf