Closed wanghaisheng closed 6 years ago
Parsing all ocr-outputfiles to an database
(This step only has to be done once)
Pre-process the gathered information
The results from the following processes can also
be stored directly to the database
Combine file information
The output can be stored in the database and/or as .txt or .hocr.
Evaluate the output against groundtruth files or each other and generate a accuracy report. Or compare the files visual via diff-tools.
https://github.com/KBNLresearch/ochre
Ochre
Ochre is a toolbox for OCR post-correction.
Overview of OCR post-correction data sets
Preprocess data sets
Train character-based language models/LSTMs for OCR post-correction
Do the post-correction
Assess the performance of OCR post-correction
Analyze OCR errors
Ochre contains ready-to-use data processing workflows (based on CWL). The software also allows you to create your own (OCR post-correction related) workflows. Examples of how to create these can be found in the notebooks directory (to be able to use those, make sure you have Jupyter Notebooks installed). This directory also contains notebooks that show how results can be analyzed and visualized.
Process, enhance and evaluate multiple ocr-ouput. https://github.com/UB-Mannheim/ocromore