ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.42k stars 591 forks source link

Trying to test out ocropus from sources #332

Open crazylyle opened 4 years ago

crazylyle commented 4 years ago

Trying to just try out OCRopus on some files that tesseract fails (badly) on.

I download the zip file from github, and expand it into /tmp/ocropus. Then following README.md,

  1. I make sure all the packages in PACKAGES are installed. This is a Fedora 30 system, so it uses dnf, not apt-get, but all the packages are there.

  2. I use wget to get 83826134 Nov 2 2014 en-default.pyrnn.gz

  3. I then try to move to models/:

/tmp/ocropus > mv en-default.pyrnn.gz models/ mv: cannot move 'en-default.pyrnn.gz' to 'models/': Not a directory

so to correct this, I: /tmp/ocropus > mkdir models models created /tmp/ocropus > mv en-default.pyrnn.gz models/ /tmp/ocropus > ls models 83826134 en-default.pyrnn.gz

  1. I don't want to install in /usr/bin, since I just want to try it, but let's let is go to see what happens:

python setup.py install running install running build running build_py error: package directory 'ocrolib' does not exist

  1. Another missing directory, so mkdir ocrolib, and try again

Now we get, in a much longer set of messages:

package init file 'ocrolib/init.py' not found (or not a regular file) ...

warning: install_lib: 'build/lib' does not exist -- no Python modules to install

and finally:

copying build/scripts-2.7/ocropus-gated-train -> /usr/bin error: [Errno 13] Permission denied: '/usr/bin/ocropus-gated-train'

Now trying to test,

/tmp/ocropus > ./run-test Traceback (most recent call last): File "./ocropus-nlbin", line 15, in import ocrolib ImportError: No module named ocrolib

but we have a directory ocrolib/, but it is empty.

Possible Solution

Your Environment