I tried downgrading to the 3.01 homebrew version of Tesseract, but changes to the latest XCode seem to have broken the way that it assumes autoconf will work.
Using tesseract directly confirms problems loading relevant image processing libraries. after brew reinstall libtiff and brew reinstall libjpeg, TIFF support is still screwed up. This appears to be a problem with the leptonica recipe. I finally overcame it with brew reinstall --with-libtiff leptonica
This was a complete disaster, and I'm unsure where the fault lies. Clearly not with pdf-extract (other than the README referring to the wrong location for tesseract training date)! But other users will likely come across this, too.
UPDATE:
The README is also lacking a reference to copying around dia.traineddata.
Homebrew has moved up to Tesseract 3.02; the README has specific references to 3.01's directory structure.
I tried copying the training data files into the apparent directories used by the 3.02 brew install
cp ./share/eng.traineddata /usr/local/share/tessdata/ cp ./share/configs/alphanumeric /usr/local/share/tessdata/configs/
but the tests still fail.
autoconf
will work.brew reinstall libtiff
andbrew reinstall libjpeg
, TIFF support is still screwed up. This appears to be a problem with the leptonica recipe. I finally overcame it withbrew reinstall --with-libtiff leptonica
This was a complete disaster, and I'm unsure where the fault lies. Clearly not with pdf-extract (other than the README referring to the wrong location for tesseract training date)! But other users will likely come across this, too.
UPDATE:
The README is also lacking a reference to copying around dia.traineddata.