openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
930 stars 152 forks source link

imports from __init__.py not working #10

Closed phaebz closed 10 years ago

phaebz commented 10 years ago

I am doing import pyocr as described in the README, but after that I can't access pyocr.get_available_tools(). So is there something wrong with the from pyocr import * in __init__.py? FWIW, I am running on Debian Testing in a virtualenv with pyocr installed with pip.

Update: I tried with the old way from pyocr import pyocr which gives the error

In [1]: from pyocr import pyocr
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-6e3defe0fffd> in <module>()
----> 1 from pyocr import pyocr

/home/user/.virtualenvs/venv/lib/python3.3/site-packages/pyocr/pyocr.py in <module>()
     46 """
     47 
---> 48 import cuneiform
     49 import tesseract
     50 

ImportError: No module named 'cuneiform'
jflesch commented 10 years ago

Looks like I've broken Python 3 support. Not sure how I missed that ... :/ I'll try to fix it this evening.

jflesch commented 10 years ago

I've pushed a fix on the branch 'master'. Can you give it a try please ?

phaebz commented 10 years ago

I rebased on master, installed to virtualenv and tested. Same problem(s). Problem(s) are about module import. Unicode fixes are great, but I did not get to any issues since I can't even import properly. Am I missing something?

jflesch commented 10 years ago

Imports failed because of the mistakes I made we unicode. (setup.py showed various errors, and I think it didn't install all the files).

I haven't tried in virtualenv for quite some times. I'll give it a try.

jflesch commented 10 years ago

Ok, I can reproduce the problem here with Virtualenv + Python 3

phaebz commented 10 years ago

Thanks for clarification. Tested with system Python 2.7.5 (works) and system Python 3.3.2 (fails) under OS X. I only read now that you do not support OS X, though. Anyway, since it seems to work with Python 2.7.5, why not try to make it work on Python 3 on OS X? My Debian box is at work unfortunately - will try tomorrow.

jflesch commented 10 years ago

The main problem with MacOSX is that I don't know where tesseract and cuneiform binaries and data are located. Also, I have no MacOSX, which doesn't help :)

jflesch commented 10 years ago

Just so you know, there are no real differences between the last released version of Pyocr and the one in the branch 'master' (except the previous fix regarding unicode). So you can try installing the Git version system-wide (ie not in a virtualenv). It should work (well ... it does for me anyway...)

phaebz commented 10 years ago

Ok so let's leave it at that for now (: I will do the system wide install at work tomorrow.

jflesch commented 10 years ago

Oops, nevermind. It doesn't ...

jflesch commented 10 years ago

And fixed ( 4f77d7b23f61ef1f7cca16d7dff0c5a0de3e424f ). Please try it now. If it works for you as well, I will do a new release asap.

Obviously, just running the tests with python 3 and assuming everything else would go fine was a baddddd idea. I probably did the same mistake with Pyinsane ... :/

phaebz commented 10 years ago

Ok, now imports work with from pyocr import pyocr and from pyocr import builders. I tried the example from the README and it recognized tesseract.py as available too. Now I do have tesseract installed in /usr/local/bin/tesseract as is done with most (all?) homebrew binaries. I tested further with a random .jped containing text. The example functions returned empty results, i.e. '' or [] which may be related to not calling tesseract correctly. I will have a look into it. Anyway, I will close this if it works tomorrow on Debian :)

phaebz commented 10 years ago

Tested some more and it works on the test.png inside the tests/data dir with tesseract from homebrew, no tweaking!

phaebz commented 10 years ago

import works also on my Debian box. Now, you should probably change the README to reflect that import pyocr is not possible anymore, but instead from pyocr import pyocr etc. should be used.

jflesch commented 10 years ago

Actually, it was another bug. Sorry for that. It's fixed : fa695265a84418f18631fd06c524bcbcd3ce5fbe

As you probably guessed by now, since Paperwork only works with Python 2.7, I haven't tested Pyocr and Pyinsane with Python 3 much.

The recommended way to import pyocr is just import pyocr. from pyocr import pyocr is obsolete and was just kept to avoid compatibility issues (if I remember correctly, Paperwork 0.1 still uses this old way ; it will only be changed in 0.2)

phaebz commented 10 years ago

Ok, with fa695265a84418f18631fd06c524bcbcd3ce5fbe it works. Forget about the README change then. The issue about relative imports is solved then. Thanks for fixing.

jflesch commented 10 years ago

You're welcome. Also, thanks for reporting these issues in a first place.

I've released Pyocr 0.2.2 with all these fixes included.