poke1024 / origami

A suite of batches and tools for OCR tasks.
71 stars 15 forks source link

requirement missing #4

Closed a-wendler closed 3 years ago

a-wendler commented 3 years ago

When running segment for the first time, it threw a module not found error for psutil

Installing in manually on conda worked for me.

Maybe this should be included in requirements.

Thx

bertsky commented 3 years ago

Dear @poke1024,

adding to the above, if I follow the README…

conda create --name origami python=3.7 -c defaults -c conda-forge --file requirements/conda.txt

…I'll see:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - tensorflow==2.1.2

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch

I know TF packaging/distribution (older TF releases on older Python versions) is an all-out catastrophe on PyPI, but I did not think it's the same with Conda. Do you have any idea how to solve this?

Also, non-conda instructions would be much appreciated (as Anaconda installation on Linux is such a PITA, and miniconda does not seem to exist for Python 3.7 and 64-bit Linux).

FWIW, on the pip side I tried…

pip install -r requirements/pip.txt
pip install -r requirements/conda.txt

…but there's no scikit-geometry on PyPI, so I had to manually…

sudo apt-get install libcgal-dev
git clone https://github.com/scikit-geometry/scikit-geometry
pip install -e scikit-geometry

But still, it won't compile.

Besides, despite what the README states…

By default, this uses origami’s own model.

…I do need to download the bbz-segment model file from dropbox, and point to it via -m here.

poke1024 commented 3 years ago

Hi @bertsky,

not sure why conda no longer knows tensorflow, but it's the same for me.

I've moved tensorflow from conda (for some reason, this seems broken for any current tf version) back to pip. The following procedure works now:

conda create --name origami python=3.7 -c defaults -c conda-forge --file origami/requirements/conda.txt
conda activate origami
pip install -r origami/requirements/pip.txt

The steps above should install scikit-geometry via conda, which should work (it does not work via pip and indeed pip install -r requirements/conda.txtfails for me too).

I fixed the statement By default, this uses origami’s own model. in the README which is indeed bad documentation. Now reads "If you have not trained a custom model, you should download and use origami’s default model...".

I'll try to reproduce running origami from a completely fresh installation sometime tomorrow.

bertsky commented 3 years ago

Thanks a lot @poke1024 for fixing this – that was super-fast!

My conda installation works now :tada:

(Regarding the pip-only installation problem, I created https://github.com/scikit-geometry/scikit-geometry/issues/60 to address this.)

bertsky commented 3 years ago

Found 1 more: flow needs sklearn, so requirements/pip.txt should include scikit-learn.

Also, I would suggest linking to the below section on OCR models in the description of the ocr step already.

Speaking of which, how about referencing the GT4HistOCR model for Calamari besides your BBZ model?

(One should ideally also explain how to query the models for their input image requirements. Some models even have preprocessing steps like binarization or line normalization configured-in. But IIRC the GT4HistOCR models don't say this explicitly, you would only know by its input_channels being 1 instead of 3. See here for full discussion...)