ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.13k stars 1.02k forks source link

macports installation #314

Closed pgee70 closed 6 years ago

pgee70 commented 6 years ago

Using OSX 10.14.1, I don't have brew installed and use ports(v2.5.4). I did try using brew and ports at one time which opened up a world of pain. So i thought that using the python pip program was the way to go. it seems to have worked, but i don't know how to run the python package..

sudo port install qpdf tesseract jbig2enc pngquant unpaper ghostscript
sudo port install python37
sudo port select --set python3 python37
sudo port install py37-pip
sudo port select --set pip pip37
/opt/local/bin/pip-3.7 --version
pip 18.1 from /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pip (python 3.7)
sudo /opt/local/bin/pip-3.7 install ocrmypdf #this is the second time...
The directory '/Users/pgee/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/pgee/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: ocrmypdf in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (7.3.0)
Requirement already satisfied: img2pdf<0.4,>=0.3.0 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (0.3.1)
Requirement already satisfied: ruffus>=2.7.0 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (2.8.0)
Requirement already satisfied: pdfminer.six==20181108 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (20181108)
Requirement already satisfied: Pillow!=5.1.0,>=4.0.0; sys_platform == "darwin" in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (5.3.0)
Requirement already satisfied: cffi>=1.9.1 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (1.11.5)
Requirement already satisfied: python-xmp-toolkit<3,>=2 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (2.0.1)
Requirement already satisfied: pikepdf<0.4,>=0.3.7 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (0.3.7)
Requirement already satisfied: reportlab>=3.3.0 in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from ocrmypdf) (3.5.9)
Requirement already satisfied: six in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pdfminer.six==20181108->ocrmypdf) (1.11.0)
Requirement already satisfied: sortedcontainers in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pdfminer.six==20181108->ocrmypdf) (2.0.5)
Requirement already satisfied: pycryptodome in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pdfminer.six==20181108->ocrmypdf) (3.7.0)
Requirement already satisfied: pycparser in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from cffi>=1.9.1->ocrmypdf) (2.19)
Requirement already satisfied: pytz in /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from python-xmp-toolkit<3,>=2->ocrmypdf) (2018.7)
pgee$ which ocrmypdf
pgee$ locate ocrmypdf
/opt/local/bin/pip-3.7 show ocrmypdf
Name: ocrmypdf
Version: 7.3.0
Summary: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Home-page: https://github.com/jbarlow83/OCRmyPDF
Author: James R. Barlow
Author-email: jim@purplerock.ca
License: UNKNOWN
Location: /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
Requires: python-xmp-toolkit, img2pdf, Pillow, ruffus, pikepdf, cffi, reportlab, pdfminer.six
Required-by:

i read something and tried:

cd /opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin
mbp2016:bin pgee$ ./ocrmypdf
Traceback (most recent call last):
  File "./ocrmypdf", line 7, in <module>
    from ocrmypdf.__main__ import run_pipeline
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 37, in <module>
    from . import pdfa
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ocrmypdf/pdfa.py", line 41, in <module>
    from libxmp.utils import file_to_dict
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/libxmp/__init__.py", line 50, in <module>
    from .core import XMPMeta, XMPIterator
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/libxmp/core.py", line 50, in <module>
    from . import exempi as _cexempi
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/libxmp/exempi.py", line 69, in <module>
    EXEMPI = _load_exempi()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/libxmp/exempi.py", line 60, in _load_exempi
    raise ExempiLoadError('Exempi library not found.')
libxmp.ExempiLoadError: Exempi library not found.

frankly i am lost with python . assistance appreciated.

jbarlow83 commented 6 years ago

I don't know why the binary is not installed.

Although it is probably installed here

/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ocrmypdf

But I recommend installing with pip install --user as discussed here since it is better to avoid modifying the Macports Python, in case other dependencies interact with it too: https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-with-python-pip

jbarlow83 commented 6 years ago

You'll need to the exempi package too: https://www.macports.org/ports.php?by=library&substr=exempi

pgee70 commented 6 years ago

Thanks, The reason I didn't do as per instructions on your site was the version of pip I had was running on the python 2.7 frame work and I didn't know how to fix that:

pip --version
pip 18.1 from /Library/Python/2.7/site-packages/pip (python 2.7)

Thanks for above that was a nudge in the right direction. this list might help others + thanks for prompt assistance, closing ticket now.

sudo port install qpdf tesseract tesseract-eng jbig2enc pngquant unpaper ghostscript exempi wget
sudo port install python37
sudo port select --set python3 python37
sudo port install py37-pip
sudo port select --set pip pip37
sudo /opt/local/bin/pip-3.7 install chardet
sudo /opt/local/bin/pip-3.7 install ocrmypdf

get a new version of a data file for tesseract:

wget https://github.com/tesseract-ocr/tessdata/raw/3.04.00/osd.traineddata
sudo mv osd.traineddata /opt/local/share/tessdata/

run with:

TESSDATA_PREFIX=/opt/local/share/tessdata
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/ocrmypdf --version
7.3.0