kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML
11 stars 2 forks source link

translate TranskribusMetadata into v>=2018 MetadataItem #5

Open bertsky opened 2 years ago

bertsky commented 2 years ago

The valuable information does not have to be removed. Transforming not just the attributes, but also its recursive Property elements into MetadataItem/Labels/Label is worthwhile IMO.

bertsky commented 2 years ago

Also: at most segment types, we should convert Tag, Property and Link to something appropriate instead of removing them.

IIRC Transkribus uses these to label lines as "illegible" or "abbreviated" etc. Perhaps we should first make sure we understand the semantics and schema of allowed values before we map to Labels in PRImA.

bertsky commented 2 years ago

It would be really helpful to have an example page from Transkribus which heavily uses these features. (Generally, a regression test would be nice to have...)

bertsky commented 4 months ago

Perhaps we should also synchronise with DTABf concordance...

bertsky commented 4 months ago

In particular: setting a correct @type for each predefined @custom, e.g. paragraph for poem_lg or other for closer.