nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.05k stars 152 forks source link

PDF extraction #32

Open Amy-raj opened 6 months ago

Amy-raj commented 6 months ago

I have created pdf from its docx version in which sections and subsections were created by built in heading styles instead of numbering .It is not able to recognise few subsections inside sections

Amy-raj commented 6 months ago

Can you please guide what formatting style should be applied in pdf for extraction of all sections and subsections?