Headings (Table of Content, TOC)

jennis0 / burdoc

Advanced PDF parsing for python

MIT License

4 stars 2 forks source link

Ah, I think this one might be challenging as it's a false positive for one of the rules used to identify headings (a short bold piece of text directly preceding a standard paragraph and visually spaced from any prior text). Arguably it is a heading, albeit not one that'd be presented in a standard ToC.

I wouldn't expect --no-ml-tables to change this as turning off table-finding means we don't actually try to identify tables in the text, the text the contain still goes through the main text parsing pipeline (and Burdoc doesn't yet identify captions associated with tables so it wouldn't make a difference even if the table had been found)

jennis0 / burdoc

Headings (Table of Content, TOC) #10