nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.15k stars 113 forks source link

llmsherpa is Missing Information #91

Open hasandot opened 3 days ago

hasandot commented 3 days ago

I have used llmsherpa to process this PDF. This is a Network Protocol Specification document.

I have utilized the demo provided by you in Colab.

It does not get any error. When I convert it to text, it is converting only a portion of the pdf. Essentially it is missing lots of information. I utilized both pdf url and local pdf file path.

  1. I printed all the section titles and the output does not match the pdf. Output is provided here.
  2. I also converted the pdf to text and it is significantly smaller. Converted text file is here.

My main concern: is there any particular reason why llmsherpa might not work for Network Protocol Specification Pdf documents?