slub / mets-mods2tei

Convert bibliographic meta data in MODS format to TEI headers
Apache License 2.0
8 stars 7 forks source link

Treat CONTENT attribute not only from String elements (but others as well) #52

Closed wrznr closed 2 years ago

wrznr commented 3 years ago

Currently, only CONTENT attributes from String are evaluated and realized in the resulting TEI. But ALTO has some other elements which may carry this attribute, most notably HYP.

bertsky commented 2 years ago

Isn't it also debatable whether it is correct to just join all TextLine/String like so?

https://github.com/slub/mets-mods2tei/blob/fc7b0f7cfb8a58e483bd355a7ae2eaaa7aebc6fe/mets_mods2tei/api/alto.py#L95

(I would expect that the white-space joiner only be applied where there is an SP interspersed. But there might be different conventions in the field, like having no SP at all, i.e. implicit white-space, as in PAGE-XML.)

bertsky commented 2 years ago

Regarding HYP itself, I'm not sure anymore whether printing @CONTENT verbatim is correct: Basisformat states that only hyphen-minus should be allowed.

Maybe make that a config parameter? (We could technically have lots of these; related to #26)