ulb-sachsen-anhalt / ocrd-odem

OCR Workflows based on OCR-D
MIT License
3 stars 1 forks source link

numpy 2x corrupts data #22

Closed M3ssman closed 3 months ago

M3ssman commented 3 months ago

Description

Starting version 2.0 numpy changed the output format of it's numpy.unique from plain numericals to the according numpy data types, i.e. from 4.8, 928 to np.float64(4.8), np.int64(928). Although reasonable, this breaks subsequent workflows when we enrich runtime information into our record lists, for example when we evaluate record entries to sum up how many pages were done in which dimensions and so on. Therefore we need to ensure plain numbers are stored again.