Hierarchical Reading Order

Papermage currently extracts section headings, but does not extract text that belongs to those sections, even as it has sentences/paragraphs that can be associated.

Find a way to render a PDF in a natural, "hierarchical" reading order that allows us to annotate per-section metadata.

This can either be using PaperMage + heuristics, or it can be with a totally separate tool, like watr-works or grobid

Tasks:

[x] Evaluate watr-works
[x] Evaluate grobid
[ ] Write Heuristics

gsireesh / ht-max

Hierarchical Reading Order #5