Recombobulator to enable OCR text for full pdf to be downloaded

Since the IIIF print deployment, the OCR of a single pdf page can be downloaded as .txt .json or .xml.

I expect that most users who want to download the OCR text would want to do this for the whole pdf file and downloading 1 page at a time would probably be onerous. If there does prove to be user demand for the OCR text, it would probably be helpful if we can enable users to download the OCR text .txt .json or .xml for all pages of a pdf or all pages of a work.

In Slack question about whether this was currently possible, Jeremy explained: 'That is presently not a feature; as we don’t have a “recombobulator” of the constituent parts'.

scientist-softserv / britishlibrary

Recombobulator to enable OCR text for full pdf to be downloaded #446