nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.37k stars 134 forks source link

Missing lines #71

Open mshibashish opened 5 months ago

mshibashish commented 5 months ago

For some documents, the initial and final lines of a page are not being read. Passing each page individually to the reader seems to solve the issue. I have attached a sample of the same, where in the first page, the last two lines are not being detected. However, if I remove the second page and only parse the first one, the issue seems to be resolved.

out2.pdf