Open jpbalarini opened 6 months ago
This depends on the type of document you have. If it is a well structured legal/financial document, this may provide you more consistent structure. If this is a powerpoint file turned into PDF and has inconsistent header structure, neither of the parsers would be fully accurate. Best is to try both the indenting schemes and see which one produces better results for you.
Hi! I'm looking to use the nlm-ingestor + llmsherpa to ingest PDFs. I saw that there is an option to use a different algorithm with the
useNewIndentParser
flag. What is the difference with the old parser? Is it recommended for use in a production app? Is it still experimental or a WIP?Thanks!