Closed wrznr closed 2 years ago
Isn't it also debatable whether it is correct to just join all TextLine/String
like so?
(I would expect that the white-space joiner only be applied where there is an SP
interspersed. But there might be different conventions in the field, like having no SP
at all, i.e. implicit white-space, as in PAGE-XML.)
Regarding HYP
itself, I'm not sure anymore whether printing @CONTENT
verbatim is correct: Basisformat states that only hyphen-minus should be allowed.
Maybe make that a config parameter? (We could technically have lots of these; related to #26)
Currently, only CONTENT attributes from
String
are evaluated and realized in the resulting TEI. But ALTO has some other elements which may carry this attribute, most notablyHYP
.