mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
751 stars 131 forks source link

Optionalize glyph/light mode for ALTO #418

Closed PonteIneptique closed 1 year ago

PonteIneptique commented 1 year ago

Hey @mittagessen :) I have been playing around with Kraken for treating massive sets, and the current <glyph> approach has made the XML files much much larger, specifically when they are not needed. Would you be okay if we'd introduce a alto-light export format that does not include glyph or would you consider that a cluttering thing ?

mittagessen commented 1 year ago

Sure, that's exactly why the serializer uses templating. If you feel fancy you could add a --output-template option to the main kraken command that allows serialization with arbitrary templates and then just strip out the glyph parts out of the default ALTO one.

mittagessen commented 1 year ago

I just added support for external templates in the serialization API and CLI. You can now select an arbitrary template with kraken --template $PATH ... to get whatever output you want.