Open mikegerber opened 3 years ago
The header is in TextRegion
r3
, but the ReadingOrder
only includes the main text in r1
, so dinglehopper does only extract the main text. This means: The file is buggy, not dinglehopper.
However, we can do better by warning that any region is not included in the extracted text.
For 00451941.gt.xml,
dinglehopper-extract
does not extract the header's textDE L'ESPRIT DE L'HOMME
.