kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML
11 stars 2 forks source link

Merge Transkribus PAGE with OCR-D-Page #16

Open M3ssman opened 2 years ago

M3ssman commented 2 years ago

Description

I'd like to have some mechanics that, besides transforming corrected Transkribus' PAGE 2013, can also merge information from OCR-D-PAGE 2019 when transforming.

Motivation

Our Transkribus-Import actually does its nasty transformations, but kindly stores the original OCR-D-PAGE in a sub directory, because it thinks it's of PAGE 2010 origin (well, that is another story ... ). But due XLST nearly all metadata information is being dropped, with only few being kept.

To preserve the provenience data on processors and their parameters, it would be really helpful to re-integrate this again at re-conversion time, if the data is available.

M3ssman commented 2 years ago

Here some test materials. (to view the corresponding image, please go to urn:nbn:de:gbv:3:1-113523-p0442-2

urn+nbn+de+gbv+3+1-113523-p0442-2_ger.zip l

kba commented 2 years ago

If that's alright with you, let's discuss in detail in the next open tech call or have a call before depending on how pressing this is for you.