Open bertsky opened 2 years ago
Workaround in the meantime: apply https://github.com/kba/page-to-alto, as included in the ocrd-fileformat-transform page alto
(but you may have to use script-args
for page-to-alto, e.g. --dummy-word --no-check-words --no-check-border
)
For inspiration: https://github.com/dariok/page2tei/blob/master/page2tei-0.xsl
EDIT: but we would have to coordinate that with https://www.deutsches-textarchiv.de/doku/basisformat/
In addition to ALTO
text/xml
, we should support PAGEapplication/vnd.prima.page+xml
files.(One scenario could be OCR-D processed material.)