Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.
A way to work around this problem would be to split the text areas prior to OCR.
For instance, unpaper can do that (ocrfeeder uses it).
Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.
A way to work around this problem would be to split the text areas prior to OCR.
For instance, unpaper can do that (ocrfeeder uses it).