Open MrUnknown789556 opened 1 year ago
Ah, I think this one might be challenging as it's a false positive for one of the rules used to identify headings (a short bold piece of text directly preceding a standard paragraph and visually spaced from any prior text). Arguably it is a heading, albeit not one that'd be presented in a standard ToC.
I wouldn't expect --no-ml-tables to change this as turning off table-finding means we don't actually try to identify tables in the text, the text the contain still goes through the main text parsing pipeline (and Burdoc doesn't yet identify captions associated with tables so it wouldn't make a difference even if the table had been found)
Trying to extract the table of content ("Introduction", ..., "References"), I looked into the extracted html file from Burdoc. It could fairly good distinguish the headings from other items in the text. Burdoc extracted all the named outlines correctly, but also an additional item, that is not part of the TOC. It additional extracted the item "Table 4".
I use the string "" to search in the generated html file for the TOC.
There seems to be no difference, if I use Burdoc with the parameter "--no-ml-tables" or not.
The.pdf