move to python3 - Githubissues

marijani101 commented 4 years ago

As Python 2 is coming to an end, wouldn't it be better to migrate to Python 3?

zuphilip commented 4 years ago

The hocr-tools are working with Python 3 already. We support Python 2 and 3 together.

stweil commented 4 years ago

Maybe some compatibility constructs can be removed from the code as soon as Python 2 is gone, but for the moment I think there is nothing to be done. @marijani101, did you notice problems with Python 3 which would require an action now? I only found that the README could be updated to mention Python 3 as well. Maybe you want to send a pull request for that?

stweil commented 4 years ago

The Python 2 package names in the README.md should be replaced by Python 3 package names. @marijani101, can you send a pull request?

FriedrichFroebel commented 1 year ago

It seems like while the setup file still advertises Python 2, https://github.com/ocropus/hocr-tools/commit/269d63a816dc801b77e549b9c3b3bde708912286 basically drops this support in the most recent time. This contradicts with the following code, as f-strings have not been available before Python 3.6: https://github.com/ocropus/hocr-tools/blob/0ad95b3606229c8a6895a3a6e782ff88d9db1d8d/setup.py#L24-L26

stweil commented 1 year ago

Right, thanks for reporting this. Do you want to send a pull request which removes all old entries? All Python versions before 3.7 are unsupported.

FriedrichFroebel commented 1 year ago

I just did some more tests regarding version support and stumbled upon some more stuff which probably needs some attention (and is more or less related to Python 3 support):

CI fails due to beautifulsoup4 never being installed, but apparently being called by the tests (according to the code, the package is optional for regular installations):

# ocrx_word argument
not ok 17 - Failed: hocr-extract-images -U -p word-%03d.png -b ../testdata -e ocrx_word ../testdata/tess.hocr
---
diag: |
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.7.15/x64/bin/hocr-extract-images", line 79, in <module>
from bs4 import UnicodeDammit
ModuleNotFoundError: No module named 'bs4'
...

Current lxml versions do not work with hocr-extract-images (fixed by using doc = html.document_fromstring(content.encode('utf-8'), parser=parser) instead:

# ocrx_word argument
not ok 17 - Failed: hocr-extract-images -U -p word-%03d.png -b ../testdata -e ocrx_word ../testdata/tess.hocr
---
diag: |
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.7.15/x64/bin/hocr-extract-images", line 83, in <module>
doc = html.document_fromstring(content, parser=parser)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/site-packages/lxml/html/__init__.py", line 759, in     document_fromstring
value = etree.fromstring(html, parser, **kw)
File "src/lxml/etree.pyx", line 3257, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1911, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
...

We might want to migrate CI from Travis to GitHub Actions as well.
We might want to improve the overall package structure to declare the current scripts as console scripts entry points and move the actual implementations to some common package (making code re-usage easier). This would allow us to use the standardized stdlib unittest module for testing as well (due to the package being importable) which enables easier integration with other tools like coverage.py to actually ensure that the code is properly tested if dropping old Python 2 compatibility code for example.

mrghosti3 commented 1 year ago

Any news on this issue?

ocropus / hocr-tools

move to python3 #158