Open YoshikiTakashima opened 1 month ago
Have you tried using the OCR script? It's obviously more expensive as it requires LLM api access, but you should get much better results. I can take a look at this later, but I had difficulty getting this type of relationship to be present in the final txt.
I took a look and unfortunately it seems to be a problem with the underlying parser I am using. It doesn't detect any text in the PDF at all... I added an error message. Did you have any luck with the OCR version?
OCR Works.
Hi Shmuel.
I'm Yoshiki, Scott's postdoc.
Is it possible for your tool to handle nested lists? Here's an example: example.pdf
I can't find the example file that worked, but even when it works, it still deletes nesting relationships. Those need to be preserved for our use case.
Thanks ~Yoshiki