Add detection of image/graphics features

As far as I can tell, the way things are set up the hOCR files generated don't include any layout information on the location of images in a document.

From the spec it looks like this should be possible? http://kba.cloud/hocr-spec/1.2/#floats-image

The previous OCR approach using ABBYY did produce picture features, and this allowed some really exciting things like programatically extracting and exploring images from books, which then resulted in the Internet Archive Book Images project, something that wouldn't really be feasible if you had to download every book page image just to check if it contained any illustrations.

Is there a reason why this features aren't included, or is this something that just needs to be enabled?

Edit:

Ah, I think I might be in the wrong place, I thought this repository related to the generation of the hocr files. Does that stuff live somewhere else?

Edit2:

Found it: https://git.archive.org/www/tesseract

internetarchive / archive-hocr-tools

Add detection of image/graphics features #4