impresso / impresso-text-acquisition

🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
https://impresso.github.io/impresso-text-acquisition/
GNU Affero General Public License v3.0
7 stars 2 forks source link

Data preparation: partition rebuilt data with Run AI #125

Closed e-maud closed 4 months ago

e-maud commented 4 months ago

The need for this has evolved and on-the-fly partitioning is either not necessary anymore because of newspaper-based processing, or done in different ways whether dask or ray or something else is used. This can thus be closed.