qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
67 stars 14 forks source link

how to use sbb_binarization within a script? #26

Closed SB2020-eye closed 3 years ago

SB2020-eye commented 3 years ago

Hi. After searching for numerous hours without success, I am wondering if someone might offer insight on how to run this from within a python script.

(Using Windows 10 os, Visual Studio Code) For example, I can run the following successfully from the terminal: sbb_binarize --patches -m 'C:/Users/Scott/Desktop/Python2/sbb_binarization/models' 'C:/Users/Scott/Desktop/Python2/Kpics/Pages_cropped/061r.png' 'C:/Users/Scott/Desktop/Python2/Kpics/new_test8.png' However, if I try the following script (using CodeRunner extension):

import subprocess
def sbb_def():
    args = ['sbb_binarize', '--patches', '-m', 'C:/Users/Scott/Desktop/Python2/sbb_binarization/models', 'C:/Users/Scott/Desktop/Python2/Kpics/Pages_cropped/061r.png', 'C:/Users/Scott/Desktop/Python2/Kpics/new_test8.png']
    subprocess.Popen(args)
sbb_def()

I get the following:

[Running] C:\ProgramData\Anaconda3\Scripts\activate.bat C:\ProgramData\Anaconda3 & python "c:\Users\Scott\Desktop\Python2\my_sbb_binarization_example.py"
Traceback (most recent call last):
  File "c:\Users\Scott\Desktop\Python2\my_sbb_binarization_example.py", line 8, in <module>
    sbb_def()
  File "c:\Users\Scott\Desktop\Python2\my_sbb_binarization_example.py", line 6, in sbb_def
    subprocess.Popen(args)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

[Done] exited with code=1 in 0.687 seconds

I don't suggest that this is a bug or anything. I'm rather sure the "issue" is mine. I'm very green at python/coding in general. Any help would be greatly appreciated.

kba commented 3 years ago

It appears that the directory containing the sbb_binarize executable is not in your path, c.f. https://bugs.python.org/issue8557. I have very little experience with python in windows but in linux I would try something like

from os import environ
if 'PATH' in environ:
    environ['PATH'] += ':/path/to/dir-containing-sbb-binarize/'
else:
    environ['PATH'] = '/path/to/dir-containing-sbb-binarize/'
# stuff with subprocess
kba commented 3 years ago

Or use an absolute path instead of sbb-binarize.

SB2020-eye commented 3 years ago

Thanks so much for responding, @kba.

I believe I tried absolute path before, but just in case, I did it again, with these results:

Traceback (most recent call last):
  File "c:\Users\Scott\Desktop\Python2\my_sbb_binarization_beta5.py", line 38, in <module>
    sbb_def()                                                   # to run my_sbb_binarization.py by itself, uncomment this line
  File "c:\Users\Scott\Desktop\Python2\my_sbb_binarization_beta5.py", line 23, in sbb_def
    subprocess.Popen(args)
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

(The traceback looks almost identical to the first one, but the last-line error has changed, of course.) It isn't a windows folder properties thing. I have gone in there, and every possible user and/or administrator configuration has all possible permissions.

(More info: I winnowed down my code in order to post. I also originally had the following, as well: import os ...then, immediately under the def sbb_def:: os.chdir('C:/Users/Scott/Desktop/Python2/sbb_binarization') I get the same results whether this is included or commented out.)

kba commented 3 years ago
PermissionError: [WinError 5] Access is denied

This looks like the executable doesn't have (in Linux terminology) the executable bit set. I don't think that exists in Windows.

So I guess you'd have to do something like

    args = ['C:/path/to/python3.exe', 'C:/path/to/sbb_binarize', '--patches', '-m', 'C:/Users/Scott/Desktop/Python2/sbb_binarization/models', 'C:/Users/Scott/Desktop/Python2/Kpics/Pages_cropped/061r.png', 'C:/Users/Scott/Desktop/Python2/Kpics/new_test8.png']

But why go through that route at all, why not use the python code directly?

from sbb_binarize import SbbBinarizer
SbbBinarizer("C:/...").run(image_path="C:/...", save="C:/...")
SB2020-eye commented 3 years ago

Thanks again. Trying your last suggestion. I appreciate something that more experienced folks know to be more direct and/or simpler.

I can't get it working yet. I either get this (from my default working python directory): ModuleNotFoundError: No module named 'sbb_binarize' Or this (from sbb_binarization folder): ImportError: cannot import name 'SbbBinarizer' from 'sbb_binarize' (c:\Users\Scott\Desktop\Python2\sbb_binarization\sbb_binarize\__init__.py) Or from sbb_binarize folder, it just runs and ends almost right away, but with no saved file and no output.

I also tried things like from .sbb_binarize import SbbBinarizer, and from . import SbbBinarizer, but they didn't work. Usually the message was ImportError: attempted relative import with no known parent package

kba commented 3 years ago

For example, I can run the following successfully from the terminal:

If you can run it from the terminal, it should be usable in a script. We work with virtual environments exclusively to control what python version which which packages is available. I'm sure VirtualStudio should support that too.

But in the meantime: how did you install sbb_binarization? What is the content of the sbb-binarize script? That will give you a clue as to where it is actually installed and which python you have to use to be able to import the module in your scripts.

SB2020-eye commented 3 years ago

Thank you so much! After all this, it seems it is something as simple as this. I thought that if I had activated the appropriate conda environment (in this case, the one I created for sbb_binarization), that scripts would run from that environment, not just command line stuff.

I made each -- command line and script -- print python version. Command line was 3.7.0 (as I specified during installation -- a step I figured out I needed to take during my installation of sbb_binarization). Script in code runner was 3.8.6, which is the python version of my default environment.

I had to search to figure out how to remedy this. You have to manually select the correct "interpreter path" prior to running the script. (I thought terminal environment and interpreter path basically synced.)

So, hallelujah, I can now run the script as script. :) This is using a script I wrote ('my_sbb_test2b.py') which is placed inside the sbb_binarization/sbb_binarize folder. (It takes a long time (1070.903 seconds for 1 11.8MB image), but it also took long in the command line, of course...and it works! Thank you!)

Now, being one step closer, what I ultimately need is to be able to call this script (or just SbbBinarizer) from another script found in my default python folder (one level up from sbb_binarization) -- which exists in a different conda environment (and different python version).

cneud commented 3 years ago

So, hallelujah, I can now run the script as script. :)

Great!

It takes a long time (1070.903 seconds for 1 11.8MB image)

Yes, we have not made any optimizations for speed yet, only for quality of results (same with eynollah). Are you using a GPU or CPU only? Using a GPU can cut processing time roughly in half.

SB2020-eye commented 3 years ago

Yeah, just GPU. Thanks.

kba commented 3 years ago

Yeah, just GPU. Thanks.

I think you meant CPU. On my (not very powerful) machine without a GPU, even a minimal example will take at least 5-10 times as long as with GPU. Also goes for eynollah and sbb_textline_detection. These tools use neural networks that benefit from the massively parallel calculations a GPU can do.

SB2020-eye commented 3 years ago

Ha. Yes, of course I meant CPU. Sorry about that! Thanks once again.