kba / page-to-alto

Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
Apache License 2.0
14 stars 5 forks source link

output invalid: @TYPE not allowed under TextBlock #11

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

Perhaps inspired by our discussion in #1, the current transformation tries to map PAGE's TextRegion/@type to a non-existing ALTO TextBlock/@TYPE. But types are only allowed for IllustratrionBlock and ComposedBlock.

Error is in LayoutTagManager.set_alto_tag_from_type.

Preserving the type in LayoutTag/@TYPE would be correct I think.