internetarchive / archive-hocr-tools

Efficient hOCR tooling
Other
38 stars 9 forks source link

archive-hocr-tools

This repository contains a python package to perform hOCR parsing efficiently, and it also contains a set of tools that can help perform operations on and analyse hOCR files.

The python library is called hocr.