mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
751 stars 133 forks source link

PageXML schema requires Created/LastChange date to be UTC, kraken sets local time #327

Closed bencomp closed 2 years ago

bencomp commented 2 years ago

This is a minor issue, but I do like XML output to be as good as it can get :)

The XML schema for the Created and LastChange elements in Page XML (or PAGE XML) says The timestamp has to be in UTC (Coordinated Universal Time) and not local time. https://ocr-d.de/en/gt-guidelines/pagexml/pagecontent_xsd_Complex_Type_pc_PcGtsType.html#PcGtsType_Metadata

The serialization function provides local time to the templates, although the ALTO template does not add info about time and its schema does not require indication of time zone or that the time be in UTC.

I can provide a PR for this.

mittagessen commented 2 years ago

Sure. If it isn't too much trouble I'd prefer the pull request. Otherwise I can do it myself when adding the infrastructure for explicit reading order.