nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.15k stars 113 forks source link

Two column PDFs #78

Open diego898 opened 2 months ago

diego898 commented 2 months ago

As a PoC I tried usign this to parse a PDF legal document: https://www.govinfo.gov/content/pkg/CFR-2023-title21-vol7/pdf/CFR-2023-title21-vol7-chapI-subchapG.pdf

it seems to not do well at all with the two-column layout. Are two column PDFS supported?

thomastiotto commented 2 months ago

Same behaviour I found. It really messes up the block tree with multiple columns.

konstantinosKokos commented 1 week ago

virtually useless for 2 columns