slub / mets-mods2tei

Convert bibliographic meta data in MODS format to TEI headers
Apache License 2.0
8 stars 7 forks source link

serialize PAGE text #55

Open bertsky opened 2 years ago

bertsky commented 2 years ago

In addition to ALTO text/xml, we should support PAGE application/vnd.prima.page+xml files.

(One scenario could be OCR-D processed material.)

bertsky commented 2 years ago

Workaround in the meantime: apply https://github.com/kba/page-to-alto, as included in the ocrd-fileformat-transform page alto (but you may have to use script-args for page-to-alto, e.g. --dummy-word --no-check-words --no-check-border)

bertsky commented 2 years ago

For inspiration: https://github.com/dariok/page2tei/blob/master/page2tei-0.xsl

EDIT: but we would have to coordinate that with https://www.deutsches-textarchiv.de/doku/basisformat/