Open samgriek opened 4 months ago
I'm also seeing this. When using llmsherpa.readers.LayoutPDFReader
with the read_pdf
method, the returned output is missing the title line of my PDF--which happens to be one of the first lines.
At least it's not just me! I would be open to fixing it but I'm guessing it's related to the NLP model?
Same here!
I'm running the server in docker:
image: ghcr.io/nlmatics/nlm-ingestor:latest
I've only tested with one 300page PDF and it seems to skip the first couple lines of the PDF. It doesn't seem to be an issue but It makes me wonder if anything else is being skipped. This is the same whether I convert to text, use sections, or convert to html.
What might be the cause?