Closed dibu28 closed 2 years ago
Wow, nice... definitely interested in getting this merged.
I'd prefer to encapsulate changes and have a single source of truth.
duplicate_file
that is implemented as os.symlink
or a copy on Windowsgs_exec
that is set based on platform at import timeFor the named temporary file issue we can avoid Ghostscript temporary files entirely. I'll push a commit for you that improves that.
That sort of thing.
I'd like to get to 100% tests passing on Windows. (Of course we can skip platform specific tests.)
See the branch gs-temp-files
for a commit that removes NamedTemporaryFile from ghostscript.py
I implemented the rest of the changes you suggested in a compatible way in the windows
branch.
By any chance, do you know how to automate the installation of those packages (headless) for continuous integration?
It depends on which CI you use.
In simple words: 1) python-3.7.5-amd64.exe, tesseract-ocr-w64-setup-v5.0.0-alpha.20191030.exe and gs950w64.exe are Windows installers they should have the command line option for "silent mode" but the option can be different, depending on the type of installer they use. (python-3.7.5-amd64.exe - is just a Python itself. If it will be allready installed then no need to insatll this).
2) qpdf-9.0.2-bin-msvc64.zip - is just а folder you should unzip and place it somewhere or if there is a Python package for it then just install it as dependency.
3) Add paths to PATH variable, so thet OCRmyPDF script can find all those executables.
Also. I've tried windows branch on my system and it is working.
The test suite is pretty far from passing unfortunately.
To elaborate, I was able to replicate what you set up and fixed a few things.
Some notes, more for myself:
choco install tesseract --pre
Really nice - but that leads me to the question: where can i get the windows exe file?
This is still in development (mainly limited by my available time to work on it) and it does not pass the test suite on Windows so user beware.
I believe if you do python setup.py build
on a source directory Windows will build an exe. (@dibu28 do you know for sure?)
I patched the files as described and no: it doesn't build an exe, .
python setup.py bdist --format=msi
should make a windows .msi installer
Use the windows
branch on this repo for the latest change set
Yes, something was built, but when i run it:
Traceback (most recent call last):
File "D:\xyz\OCRmyPDF\build\bdist.win-amd64\msi\Scripts\ocrmypdf-script.py", line 6, in <module> from pkg_resources import load_entry_point
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3250, in <module> @_call_aside
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3234, in _call_aside f(*args, **kwargs)
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3263, in _initialize_master_working_set working_set = WorkingSet._build_master()
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 583, in _build_master ws.require(__requires__)
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 900, in require needed = self.resolve(parse_requirements(requirements))
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 786, in resolve raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'ocrmypdf==9.1.0.post12+g3569da3' distribution was not found and is required by the application
Since it's not stable in the test suite yet I haven't even started to think about how to distribute it, but off the top of my head, try bypassing version management: Try removing setuptools_scm* from setup.py, manually setting the package version to something like 9.2.0a1 in setup.py, and rebuilding. Possibly reinstall into a virtual environment. This is a hackish workaround.
D:\Python37\Scripts>ocrmypdf
Traceback (most recent call last):
File "D:\Python37\Scripts\ocrmypdf-script.py", line 6, in <module> from pkg_resources import load_entry_point
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3250, in <module> @_call_aside
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3234, in _call_aside f(*args, **kwargs)
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 3263, in _initialize_master_working_set working_set = WorkingSet._build_master()
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 583, in _build_master ws.require(__requires__)
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 900, in require needed = self.resolve(parse_requirements(requirements))
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 786, in resolve raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'tqdm>=4' distribution was not found and is required by ocrmypdf
If i delete the install_requires for tqdm in setup.py - and even if i remove all of it - next error comes up:
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'Pillow>=6.2.0' distribution was not found and is required by ocrmypdf
The Windows Subsystem for Linux version works quite well.
That's no option in my environment, I use win 8.1 atm. Later usage would be in other environments and also: WSL is not an option there. It should be running native under Windows.
I'm programming a tool to create searchable pdf with powershell, tesseract and some other tools under Windows when i found your OCRmyPDF. So i thought: why reinvent the wheel ...
After install all in another environment and install all needed python packages it stops with:
D:\Python37\Scripts>ocrmypdf.exe
Traceback (most recent call last):
File "D:\Python37\Scripts\ocrmypdf-script.py", line 11, in <module>
load_entry_point('ocrmypdf==0.0.0', 'console_scripts', 'ocrmypdf')()
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 2852, in load_entry_point
return ep.load()
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 2443, in load
return self.resolve()
File "D:\Python37\lib\site-packages\pkg_resources\__init__.py", line 2449, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "D:\Python37\lib\site-packages\ocrmypdf\__init__.py", line 18, in <module>
from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "D:\Python37\lib\site-packages\ocrmypdf\leptonica.py", line 46, in <module>
lept = ffi.dlopen(find_library(libname))
OSError: cannot load library '<None>': error 0x57
which leads to https://github.com/jbarlow83/OCRmyPDF/issues/341
Is Leptonica installed?
Could also try copying liblept*.dll into D:\Python37\lib\site-packages\ocrmypdf
, or the current directory.... I imagine Tesseract installs Leptonica.
Have to setup system path and after a restart it runs. Now i'm testing.
Thanks
Good to hear. If you have any fixes please feel free to contribute.
Have to test and build again and again to make it reproducable. Now it runs in a actual win10 environment but not under win8.1:
That would be a problem for the packager of Tesseract for Windows to address.
If you run in debug mode with -k -v1 or -v2 you should be able to exact Tesseract command that fails and provide them with the .png from the temporary files folder.
You might be able to work around the error by manually compiling/installing libpng 1.6 and copying a DLL into place.
On Sat, Nov 23, 2019 at 5:27 PM bobastler notifications@github.com wrote:
Have to test and build again and again to make it reproducable. Now it runs in a actual win10 environment but not under win8.1:
[image: ocrmypdf-error01] https://user-images.githubusercontent.com/54740896/69487883-fc8df600-0e61-11ea-8fc8-a8a1cd8d6847.jpg
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/455?email_source=notifications&email_token=AAN5YM4PRWBNSVQE2W2BNMTQVHKBTA5CNFSM4JMYDTLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFABQUA#issuecomment-557848656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN5YM7UHVLJXQNZN5QS773QVHKBTANCNFSM4JMYDTLA .
It's the same Tesseract Installer. It runs under win10 but not under win81. It's late, 2:37 am, sarching tomorrow.
One question: is there an option to remove blank pages?
No, I've long resisted adding that feature because of the risk of false positives. I've never found any scanner software that does it reliably without requiring an arbitrary threshold, and as the user you really have no idea how to set that except turn it up/down if it's giving you trouble. They can also be quite different in behavior on color vs grayscale vs black and white. You can get problems like poorly exposed color/gray getting rounded off to white. So in my opinion the state of the art for that feature is pretty poor. But if you know of something that addresses the problems I'll look.
ocrmypdf is designed to be as safe as possible so you can throw millions of pages at it and be confident it didn't lose any data.
1) @jbarlow83 After I execute python setup.py build
the exe file is not available in dist folder.
If I execute python setup.py install
the exe file will be in Programs\Python\Python37\Scripts\ocrmypdf.exe and available in the PATH.
It will use the code from: Programs\Python\Python37\Lib\site-packages\ocrmypdf-9.1.1-py3.7.egg\ocrmypdf
2) I also was able to build MSI installer using python setup.py bdist --format=msi
as @jbarlow83 suggested.
3) If you get OSError: cannot load library '<None>': error 0x57
error you need to add tesseract folder to the PATH variable.
4) There is libjbig-2.dll in the tesseract installation. I don't know if you can use it.
It seems that choco install tesseract --pre
is installing tesseract from the same source i've mentioned in the first post:
https://github.com/UB-Mannheim/tesseract/wiki
https://chocolatey.org/packages/tesseract#files
@jbarlow83 can you please tell me how to run tests?
In a ocrmypdf project folder:
pip install -r requirements/test.txt pytest -n auto
On Mon., Nov. 25, 2019, 04:53 dibu28, notifications@github.com wrote:
@jbarlow83 https://github.com/jbarlow83 can you please tell me how to run tests?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/455?email_source=notifications&email_token=AAN5YM5KFTTTVWMU76CGYKDQVPDELA5CNFSM4JMYDTLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFCJGHI#issuecomment-558142237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN5YM23ZZAE4PK2WFYS37TQVPDELANCNFSM4JMYDTLA .
For pytest -n auto
I've got this result:
========== 15 failed, 183 passed, 42 skipped, 2 xfailed in 179.37s (0:02:59) ===========
on branch windows
Two more tests passing: 13 failed, 185 passed, 42 skipped, 2 xfailed in 158.49s (0:02:38)
We're now at 100% tests passing. Took longer than I thought it would, and I was expecting it to take a long time.
Wow, nice. I've pulled windows branche and tried to run tests but I've got this result: == 3 failed, 90 passed, 40 skipped, 1 xfailed, 105 errors in 34.32s== Is it ok or am I missing something? The script itself is working.
I've been rebasing the changes to organize it more logically. So make sure you hard reset and force pull the windows branch. I also hadn't pushed a change or two when you commented. If anything is still broken after this, please attach the logs so I can look.
@jbarlow83 I've downloaded QPDF 9.1.0 and set path to point to it and now errors are gone: 22 failed, 173 passed, 40 skipped, 2 xfailed But now I have failed tests with message:
E The program 'qpdf' could not be executed or was not found on your
E system PATH.
But if I execute Qpdf if the path it is available:
qpdf.exe --version
qpdf version 9.0.2
I will attach logs later
@dibu28 QPDF is no longer required, provided that pikepdf binary wheels are used. Please try the v9.2.0 release.
@jbarlow83 I've reinsatlled all my dependencies including python and pulled latest v9.2.0.
And now seems like tests passing. I've got only one failed:
=1 failed, 200 passed, 39 skipped, 2 xfailed in 118.38s (0:01:58)=
Is it correct?
As for Qpdf: If I only insatll OCRmyPDF with python setup.py insatll
and try to run OCRmyPDF I've got error:
AppData\Local\Programs\Python\Python37\lib\site-packages\pikepdf-1.8.1-py3.7-win-amd64.egg\pikepdf\__init__.py", line 10, in <module>
from . import _qpdf
ImportError: DLL load failed: The specified module could not be found.
It seems thet pikepdf don't have all required DLL libraries in it's folder pikepdf-1.8.1-py3.7-win-amd64.egg\pikepdf
there is only qpdf26.dll file
So I've downloaded full qpdf package and put qpdf-9.1.0\bin in the PATH. There are also those files in it:
api-ms-win-crt-convert-l1-1-0.dll
api-ms-win-crt-environment-l1-1-0.dll
api-ms-win-crt-filesystem-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-string-l1-1-0.dll
api-ms-win-crt-time-l1-1-0.dll
api-ms-win-crt-utility-l1-1-0.dll
msvcp140.dll
qpdf.exe
qpdf26.dll
vcruntime140.dll
vcruntime140_1.dll
zlib-flate.exe
Now the only one test which have failed is:
__________________________________ test_bash __________________________________
[gw2] win32 -- Python 3.7.5 c:\users\d\appdata\local\programs\python\python37\python.exe
def test_bash():
try:
proc = run(
['bash', '-n', 'misc/completion/ocrmypdf.bash'],
check=True,
encoding='utf-8',
stdout=PIPE,
> stderr=PIPE,
)
tests\test_completion.py:49:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
E subprocess.CalledProcessError: Command '['bash', '-n', 'misc/completion/ocrmypdf.bash']' returned non-zero exit status 4294967295.
c:\users\d\appdata\local\programs\python\python37\lib\subprocess.py:512: CalledProcessError
Both of these changes have to do with me having more development tools than typical on my Windows box.
I take it that the VC14 runtime is a requirement or something along those lines.
Not too concerned about the bash test failing. This doesn't matter for native Windows.
On Thu., Dec. 12, 2019, 01:54 dibu28, notifications@github.com wrote:
@jbarlow83 https://github.com/jbarlow83 I've reinsatlled all my dependencies including python and pulled latest v9.2.0. And now seems like tests passing. I've got only one failed. Is it correct? : =1 failed, 200 passed, 39 skipped, 2 xfailed in 118.38s (0:01:58)=
As for Qpdf: If I only insatll OCRmyPDF with python setup.py insatll and try to run OCRmyPDF I've got error:
AppData\Local\Programs\Python\Python37\lib\site-packages\pikepdf-1.8.1-py3.7-win-amd64.egg\pikepdf__init__.py", line 10, in
from . import _qpdf ImportError: DLL load failed: The specified module could not be found. It seems thet pikepdf don't have all required DLL libraries in it's folder pikepdf-1.8.1-py3.7-win-amd64.egg\pikepdf there is only qpdf26.dll file So I've downloaded full qpdf package and put qpdf-9.1.0\bin in the PATH. There are also those files in it:
api-ms-win-crt-convert-l1-1-0.dll api-ms-win-crt-environment-l1-1-0.dll api-ms-win-crt-filesystem-l1-1-0.dll api-ms-win-crt-heap-l1-1-0.dll api-ms-win-crt-locale-l1-1-0.dll api-ms-win-crt-math-l1-1-0.dll api-ms-win-crt-runtime-l1-1-0.dll api-ms-win-crt-stdio-l1-1-0.dll api-ms-win-crt-string-l1-1-0.dll api-ms-win-crt-time-l1-1-0.dll api-ms-win-crt-utility-l1-1-0.dll msvcp140.dll qpdf.exe qpdf26.dll vcruntime140.dll vcruntime140_1.dll zlib-flate.exe
Now the only one test which have failed is:
__ test_bash __ [gw2] win32 -- Python 3.7.5 c:\users\d\appdata\local\programs\python\python37\python.exe def test_bash(): try: proc = run( ['bash', '-n', 'misc/completion/ocrmypdf.bash'], check=True, encoding='utf-8', stdout=PIPE,
stderr=PIPE,
) tests\test_completion.py:49:
E subprocess.CalledProcessError: Command '['bash', '-n', 'misc/completion/ocrmypdf.bash']' returned non-zero exit status 4294967295. c:\users\d\appdata\local\programs\python\python37\lib\subprocess.py:512: CalledProcessError
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/455?email_source=notifications&email_token=AAN5YMYQZHCPYFFIM7WQODDQYIC4JA5CNFSM4JMYDTLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGWDG4I#issuecomment-564933489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN5YM465LFOVDT5VGMVYS3QYIC4JANCNFSM4JMYDTLA .
@jbarlow83 Executed tests again on the clean setup. Now it seems that all test are passing: = 199 passed, 40 skipped, 3 xfailed in 299.36s (0:04:59) =
Unfortunately it's not working on my Windows 10 machine. First I get a message box saying
The procedure entry point inflateReset2 could not be located in the dynamic link library C:\Program Files\Tesseract-OCR\libpng16-16.dll.
Then the console says
Traceback (most recent call last): File "c:\users\david\appdata\local\programs\python\python38\lib\runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\david\appdata\local\programs\python\python38\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\David\AppData\Local\Programs\Python\Python38\Scripts\ocrmypdf.exe__main.py", line 4, in
File "c:\users\david\appdata\local\programs\python\python38\lib\site-packages\ocrmypdf\ init__.py", line 18, infrom . import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "c:\users\david\appdata\local\programs\python\python38\lib\site-packages\ocrmypdf\leptonica.py", line 61, in lept = ffi.dlopen(_libpath) OSError: cannot load library 'C:\Program Files\Tesseract-OCR\liblept-5.dll': error 0x7f
Could this be 32 vs. 64 bit related? I first had Python 32-bit installed. Then I got ...error 0xc1
Then I uninstalled python and installed Python 64-bit but getting ...error 0x7f
@nQk2 try python 3.7.5. I've also had problems with 3.8.
And make shure you've added Tesseract-OCR
, gs9.50\bin
, and qpdf-9.1.0\bin
to your PATH variable.
And ;.PY
to PATHEXT variable
@nQk2 All of the components must be the same bitness and really should be 64-bit. It's not hard for a program that aggressively uses all available CPU power to run into the 2GB memory wall on 32-bit Windows.
It definitely won't work to interface 32-bit Python to a 64-bit library which is probably the cause of that stacktrace.
@dibu28 It should not be necessary to put qpdf-...\bin
in the PATH anymore. Eventually the other two won't be needed either.
@jbarlow83 You will pack tesseract and gs as python packages? (Windows versions)
The first step will be for ocrmypdf to check in reasonable locations for Tesseract and GS, examining the registry or whatever, so PATH becomes an override.
I don't believe I can bundle the GS installer unless I change OCRmyPDF to AGPL, and I'm not sure I want to do that. I believe everything else could be bundled.
As far as actually doing a Windows installer, bundling, or setting up a choco package, I am hoping the community will step up, because I haven't done made a Windows installer before or tried to package a Python application for Windows, and other people probably know how to get this off the ground faster than I can even if I end up finishing it. I converted to Azure Pipelines for its better Windows support, so that ideally we can test and deploy for every distribution type in one shot.
ocrmypdf is a unique/more complex case in its use of Leptonica (ABI level binding to a C library) and relies on calls to third party non-Python binaries. It will probably be necessary to spin off Leptonica into a separate package that gets compiled as a binary wheel, something I've already started work on actually. That means installer-generator programs that try to inspect the source code for dependencies are probably going to fail, because usually look for Python-only dependencies.
I don't know if this helps, as I'm not knowleadgeable enough, but I can't get it to run using the exact instructions currently on the documentation. Btw, thank you all, specially the maintainer, for the hard work.
The paths for tesseract and gs have been added. First I got the libcurl-4 is missing error (plus 3 other dlls). Then I installed libcurl from chocolatey and manually installed qpdf to the folder that the first comment specified (https://github.com/jbarlow83/OCRmyPDF/issues/455#issue-522103851). The current situation can be seen next; I don't know where to get pikepdf from. All are 64-bit versions. I'm running Windows 10 1909.
C:\WINDOWS\system32>choco list --local-only
Chocolatey v0.10.15
chocolatey 0.10.15
chocolatey-core.extension 1.3.5.1
curl 7.67.0
Ghostscript 9.50
Ghostscript.app 9.50
pngquant 2.12.3
python3 3.8.1
tesseract 5.0.0.20191030-alpha
8 packages installed.
C:\WINDOWS\system32>ocrmypdf
Traceback (most recent call last):
File "c:\python38\lib\site-packages\pikepdf\__init__.py", line 10, in <module>
from . import _qpdf
ImportError: DLL load failed while importing _qpdf: The specified module could not be found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\python38\lib\runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\python38\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Python38\Scripts\ocrmypdf.exe\__main__.py", line 5, in <module>
File "c:\python38\lib\site-packages\ocrmypdf\__init__.py", line 18, in <module>
from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "c:\python38\lib\site-packages\ocrmypdf\pdfa.py", line 38, in <module>
import pikepdf
File "c:\python38\lib\site-packages\pikepdf\__init__.py", line 12, in <module>
raise ImportError("pikepdf's extension library failed to import")
ImportError: pikepdf's extension library failed to import
Also, while I can successfully install 9.2.0 on Ubuntu 18.04 under WSL, when trying to access it from the command line (https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-windows-subsystem-for-linux) I get the old version for some reason:
C:\WINDOWS\system32>wsl sudo ln -s /home/user/.local/bin/ocrmypdf /usr/local/bin/ocrmypdf
[sudo] password for USERNAME:
C:\WINDOWS\system32>wsl ocrmypdf --version
6.1.2
@osnofas The trouble is likely that I've been working with a Windows 10 image with a lot of developer things on it already so it's not the best test environment.
Could you run this command and send the results? This should just print a list of files installed for pikepdf:
dir /s c:\python38\lib\site-packages\pikepdf
Also what version of pip is installed? (pip --version
and python -m pip --version
)
Regarding WSL, you'll need to ensure that /home/user/.local/bin
is added to the WSL system PATH environment variable.
On Ubuntu/WSL:
:/mnt/c/WINDOWS/system32$ pip --version
pip 19.3.1 from /home/USERNAME/.local/lib/python3.6/site-packages/pip (python 3.6)
:/mnt/c/WINDOWS/system32$ python -m pip --version
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
On Windows proper:
C:\Python38\Lib\site-packages>dir
Volume in drive C is Windows
Volume Serial Number is B23D-AC41
Directory of C:\Python38\Lib\site-packages
23/12/2019 20:09 <DIR> .
23/12/2019 20:09 <DIR> ..
23/12/2019 20:09 126 easy_install.py
23/12/2019 20:09 <DIR> pip
23/12/2019 20:09 <DIR> pip-19.2.3.dist-info
23/12/2019 20:09 <DIR> pkg_resources
18/12/2019 23:26 121 README.txt
23/12/2019 20:09 <DIR> setuptools
23/12/2019 20:09 <DIR> setuptools-41.2.0.dist-info
23/12/2019 20:09 <DIR> __pycache__
2 File(s) 247 bytes
8 Dir(s) 101.128.069.120 bytes free
C:\Python38\Lib\site-packages>
I installed python with chocolatey. Regardless, I have another python distro on Windows for use with anaconda and it doesn't have pikepdf either. (It appears that overall I have at least 4 python installations between Windows and WSL/Ubuntu.)
Hi
Describe the issue I've managed to run OCRmyPDF.exe on Windows 10 without wsl.
To Reproduce I've made fork and added some quick fixes in this commit: https://github.com/dibu28/OCRmyPDF/commit/543088e79e8649e968d02d8fd268123255607dc1
Fixes are: 1) in leptonica.py librray name is liblept-5 instead of lept 2) in ghostscript.py 2.1) executable name is gswin64c.exe instead of gs 2.2) NamedTemporaryFile doesnt work properly and gs could not modify tmp file with access denied error. (so as a temporary workaround I'm adding "_1" to temp file name and then removing file. There could be some better way) 3) in _pipeline.py and helpers.py files - symlinking to temp folder on windows requires Admin privelegies. So instead of simlinking I'm just copying files. 4) in _sync.py file - os.path.samefile is returning error: "OSError: [WinError 1] Incorrect function: 'nul'"
So after those changes and installin dependencies it started to work from command line like this: OCRmyPDF.exe input.pdf output.pdf
Dependencies and binaries I'm using: https://www.python.org/ftp/python/3.7.5/python-3.7.5-amd64.exe https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0-alpha.20191030.exe https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs950/gs950w64.exe https://github.com/qpdf/qpdf/releases/download/release-qpdf-9.0.2/qpdf-9.0.2-bin-msvc64.zip
Add paths to PATH variable: set PATH=%PATH%;C:\Program Files\Tesseract-OCR; set PATH=%PATH%;C:\Program Files\gs\gs9.50\bin\; set PATH=%PATH%;C:\qpdf\qpdf-9.0.2-bin-msvc64\qpdf-9.0.2\bin\;
Expected behavior Can we add some workarounds using conditions based on os type?
System:
Additional context