jlsutherland / doc2text

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
MIT License
1.27k stars 98 forks source link

Eror on pip install PythonMagick #26

Closed liber145 closed 7 years ago

liber145 commented 7 years ago

PythonMagick is a required package for doc2text. I installed it through pip.

(doc2txt) ➜  Programs pip install PythonMagick
Collecting PythonMagick
  Could not find a version that satisfies the requirement PythonMagick (from versions: )
No matching distribution found for PythonMagick

Anyone knows what's wrong with it...thanks.

liber145 commented 7 years ago

Further, my system is OSX El Captian 10.11.6. And it could be installed successfully in an Ubuntu desktop by apt-get install python-pythonmagick.

liber145 commented 7 years ago

Oh, I missed the statement that this works on ubuntu16.04 in ReadMe. I will transfer work to ubuntu. More, I just test it on docs/assets/images/news-button.png.

>>> import doc2text
>>> doc = doc2text.Document(lang="eng")
>>> doc.read('./news-button.png')
>>> doc.process()
>>> doc.extract_text()
>>> text = doc.get_text()
>>> text
'Ivy-0n-'
>>> exit()

And I have executed apt-get install tesseract-ocr-eng to install language package. Could you show what I have missed.