The function serialization.serialize_segmentation is giving me a strange output. In the tag declaration at the top of the Alto XML document, I get duplicated OtherTag elements with weird labels. Moreover, the TAGREFS attributes are missing from all the TextLine elements, even the non-defaults ones.
I tried different text_direction values but the issue remains. Using the same segmentation model, kraken 3 and eScriptorium give me a well-formatted Alto XML document.
That's already fixed in 4.2. I had added multi-tagging support in the pipeline but the tests didn't catch those errors in the output serialization because it is technically still correct ALTO.
The function
serialization.serialize_segmentation
is giving me a strange output. In the tag declaration at the top of the Alto XML document, I get duplicatedOtherTag
elements with weird labels. Moreover, theTAGREFS
attributes are missing from all theTextLine
elements, even the non-defaults ones.This output was produced using kraken 4.1.2 with the following code:
I tried different
text_direction
values but the issue remains. Using the same segmentation model, kraken 3 and eScriptorium give me a well-formatted Alto XML document.