internetarchive / archive-pdf-tools

Fast PDF generation and compression. Deals with millions of pages daily.
https://archive-pdf-tools.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
86 stars 13 forks source link

Installing on MacOS? #64

Closed jrochkind closed 11 months ago

jrochkind commented 1 year ago

I am curious if anyone reading has succesfully managed to install on MacOS.

I personally have very little skills at compiling C packages at all; I am also not very python experienced. So I doubt I'm going to have the current skillset to figure this out if it requires some hacking.

But just out of curiosity, I decided to see how far I could get.

I tried to get dependencies mentioned in README with:

 brew install leptonica openjpeg libxml2 libxslt jbig2enc

Then pip3 install archive-pdf-tools.

It resulted in this error... which didn't initially look like a missing dependency to me, but I'm really out of my element here... maybe it does look like a wrong version of libxml or something?

I'm curious if anyone has some insight. I am leaving this Issue mostly as a placeholder for anyone else curious or who wants to discuss it -- I doubt I personally have the skill to get the build to happen myself if it requires anything non-trivial.

console output ``` building 'sauvola' extension creating build/temp.macosx-12-arm64-cpython-311 creating build/temp.macosx-12-arm64-cpython-311/cython clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -Ofast -DNPY_NO_DEPRECATED_API -I/private/var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/pip-build-env-je1kahy0/overlay/lib/python3.11/site-packages/numpy/core/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c cython/sauvola.c -o build/temp.macosx-12-arm64-cpython-311/cython/sauvola.o cython/sauvola.c:211:12: fatal error: 'longintrepr.h' file not found #include "longintrepr.h" ^~~~~~~~~~~~~~~ 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for archive-pdf-tools Building wheel for lxml (setup.py) ... error error: subprocess-exited-with-error × python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [145 lines of output] Building lxml version 4.6.5. Building without Cython. Building against libxml2 2.9.4 and libxslt 1.1.29 running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-311 creating build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/_elementpath.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/sax.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/pyclasslookup.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/__init__.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/builder.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/doctestcompare.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/usedoctest.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/cssselect.py -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/ElementInclude.py -> build/lib.macosx-12-arm64-cpython-311/lxml creating build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/__init__.py -> build/lib.macosx-12-arm64-cpython-311/lxml/includes creating build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/soupparser.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/defs.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/_setmixin.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/clean.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/_diffcommand.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/html5parser.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/__init__.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/formfill.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/builder.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/ElementSoup.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/_html5builder.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/usedoctest.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html copying src/lxml/html/diff.py -> build/lib.macosx-12-arm64-cpython-311/lxml/html creating build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron copying src/lxml/isoschematron/__init__.py -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron copying src/lxml/etree.h -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/etree_api.h -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/lxml.etree.h -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/lxml.etree_api.h -> build/lib.macosx-12-arm64-cpython-311/lxml copying src/lxml/includes/xmlerror.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/c14n.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/xmlschema.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/__init__.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/schematron.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/tree.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/uri.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/etreepublic.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/xpath.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/htmlparser.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/xslt.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/config.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/xmlparser.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/xinclude.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/dtdvalid.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/relaxng.pxd -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/lxml-version.h -> build/lib.macosx-12-arm64-cpython-311/lxml/includes copying src/lxml/includes/etree_defs.h -> build/lib.macosx-12-arm64-cpython-311/lxml/includes creating build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources creating build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/rng copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/rng creating build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl creating build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.macosx-12-arm64-cpython-311/lxml/isoschematron/resources/xsl/iso-schematron-xslt1 running build_ext building 'lxml.etree' extension creating build/temp.macosx-12-arm64-cpython-311 creating build/temp.macosx-12-arm64-cpython-311/src creating build/temp.macosx-12-arm64-cpython-311/src/lxml clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -DCYTHON_CLINE_IN_TRACEBACK=0 -Isrc -Isrc/lxml/includes -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c src/lxml/etree.c -o build/temp.macosx-12-arm64-cpython-311/src/lxml/etree.o -w -flat_namespace src/lxml/etree.c:261877:23: error: no member named 'exc_type' in 'struct _err_stackitem' while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) && ~~~~~~~~ ^ src/lxml/etree.c:261877:53: error: no member named 'exc_type' in 'struct _err_stackitem' while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) && ~~~~~~~~ ^ src/lxml/etree.c:261891:23: error: no member named 'exc_type' in 'struct _err_stackitem' *type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261893:21: error: no member named 'exc_traceback' in 'struct _err_stackitem' *tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261907:26: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261909:24: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261910:15: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = type; ~~~~~~~~ ^ src/lxml/etree.c:261912:15: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = tb; ~~~~~~~~ ^ src/lxml/etree.c:261994:30: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261996:28: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261997:19: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = local_type; ~~~~~~~~ ^ src/lxml/etree.c:261999:19: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = local_tb; ~~~~~~~~ ^ src/lxml/etree.c:262185:26: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:262187:24: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:262188:15: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = *type; ~~~~~~~~ ^ src/lxml/etree.c:262190:15: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = *tb; ~~~~~~~~ ^ src/lxml/etree.c:264391:20: error: no member named 'exc_type' in 'struct _err_stackitem' t = exc_state->exc_type; ~~~~~~~~~ ^ src/lxml/etree.c:264393:21: error: no member named 'exc_traceback' in 'struct _err_stackitem' tb = exc_state->exc_traceback; ~~~~~~~~~ ^ src/lxml/etree.c:264394:16: error: no member named 'exc_type' in 'struct _err_stackitem' exc_state->exc_type = NULL; ~~~~~~~~~ ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Compile failed: command '/usr/bin/clang' failed with exit code 1 creating var creating var/folders creating var/folders/_1 creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T cc -I/usr/include/libxml2 -c /var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInitgevmc310.c -o var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInitgevmc310.o cc var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInitgevmc310.o -lxml2 -o a.out error: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for lxml Running setup.py clean for lxml Failed to build archive-pdf-tools lxml ERROR: Could not build wheels for archive-pdf-tools, which is required to install pyproject.toml-based projects ```
MerlijnWajer commented 1 year ago

A few things:

  1. I don't think you need to get leptonica, openjpeg, libxml, libxslt, jbig2enc for basic functionality (Pillow can compress JPEG2000 and the wheel comes with it I guess, leptonica is only for jbig2, libxml/libxslt I think will just come with pip for python)
  2. The pip3 install ... line seems to attempt to build archive-pdf-tools rather than just install the binary, I think it might be because you're on arm64. I don't know if we already build a binary for that.

For completeness sake, can you share with me your OS and Python version? (Seems to be Python 3.11, but would like to check)

MerlijnWajer commented 1 year ago

Also, searching for cython #include "longintrepr.h" clang on google seems to suggest this is an error that happens for many Python packages on macOS/clang, so there might be some hints there. Let me see what I can get done in the next few days.

MerlijnWajer commented 1 year ago

It looks like just upgrading to a newer Cython version will solve the problem, but I will still need to see if I can make the CI build these releases.

jrochkind commented 1 year ago

I will see if I can take care of it on my end with upgrades etc, I'll try a bit harder, and get back to you. Thanks for the attention!

jrochkind commented 1 year ago

I may have just gotten this to install on a different Mac that has the latest OS -- it turns out my Mac that I was having trouble on is still on MacOS 12 instead of 13.

I'm not sure if other things that are significant may differ between them too. I need to spend more time with it. But it may be a common python issue and not something special to this code, indeed, I'm not really sure.

MerlijnWajer commented 1 year ago

I made a test branch for arm wheels for mac. Can you download the artifact.zip from here and try it?

https://github.com/internetarchive/archive-pdf-tools/actions/runs/4602441350

$ ls | grep arm | grep mac
archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl
archive_pdf_tools-1.5.3-cp38-cp38-macosx_11_0_arm64.whl
archive_pdf_tools-1.5.3-cp39-cp39-macosx_11_0_arm64.whl

I can also build for other mac os x versions if 11 is not the right one. I was building for macOS-10.15 before.

MerlijnWajer commented 1 year ago

https://github.com/internetarchive/archive-pdf-tools/actions/runs/4608630138/jobs/8144663900

this one contains wheels for mac OS 10.15 and 12 as well.

MerlijnWajer commented 1 year ago

(Doesn't look like the macos 12 wheels actually made it)

jrochkind commented 1 year ago

Thank you! I'm sorry if I'm sending you on a distraction here, this may not be a priority. Some things:

Since I have demonstrated it installing succesfully on one MacOS laptop, I'm inclined to think the problem might be mine,, not yours. Although there may be things you can do to make it install more reliably, I'm no expert here.

Yeah, I am not able to find the artifact.zip on that Github Actions build page -- I'm not sure I"d know what to do with it even if i did. Not very python-comfortable here. If you'd like to me to test a build artifact, and it's not totally obvious how, please provide instructions -- but I'm wondering if this is actually my problem not yours.

I need to find more time to update my laptop to MacOS 13 (it's not old on purpose), and re-install dependencies etc, and maybe make sure I am setting up python in a best-practices way on MacOS, and see what happens.

MerlijnWajer commented 1 year ago

It looks like the zip from the action is not visible for others, please find it attached in this message.

artifact.zip

MerlijnWajer commented 1 year ago

Other than that, there are a few things to mention:

  1. You can install the wheel files in the zip like this: pip install --force-reinstall -U /path/to/wheel
  2. pip install pkgnamehere will try to fetch an online binary package typically, based on your OS and architecture, and if no binary package exists, it will fall back to try to building from source. Before you made this issue, I was not building any arm64 macOS wheels (aka Apple Silicon), and I still haven't uploaded these wheels to the place that pip gets them from.

if you can verify if these wheels work for macOS, then I can upload them to pypi (where pip gets them from).

MerlijnWajer commented 1 year ago

The problem you were encountering before was definitely caused by an older Cython version, which I have raised in a separate branch where I am trying to build these wheels. When I know that the wheels work, I can merge those changes to the master branch, and include them in a new release.

MerlijnWajer commented 1 year ago

For testing various versions, you could also consider setting up a virtualenv: https://docs.python.org/3/library/venv.html

it might be easier.

jrochkind commented 1 year ago

Hi! Trying to spend more time on this to give you feedback!

OK, I am now using venv.

I'm sorry I'm new to python, so not totally sure how to test what you'd like me to test. Thank you for your tips earlier.

You can install the wheel files in the zip like this: pip install --force-reinstall -U /path/to/wheel

I have unzipped artifact.zip... I get a bunch of .whl files. I am supposed to manually identify which one is appropriate to my system?

I have an M1 Pro MacBook, I am running MacOS 12.6.3 (note this is still not the latest MacOS, the latest is MacOS 13).

If I understand the conventional naming right, it looks like you have wheels in the artifact.zip for macosx_10_9 and macosx_11_0 -- if those numbers are version numbers, neither of those are me, but maybe I'll try the most recent one, so _11_0?

I believe my M1 Pro Macbook is arm64 rather than x86_64. I still see three candidates, I don't know how to choose from, what's the difference between cp39, cp38, and cp310?

On the the theory that bigger is better, maybe I'll try 310. So in an activated venv:

pip install --force-reinstall -U wheel_artifact/archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl
ERROR: archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl is not a supported wheel on this platform.

OK, not that one. Try the other two? Nope, same result.

Maybe the problem is that I'm on MacOS 12? Sorry I'm really flying by touch here, I don't know what I'm doing. If you want me to choose a different .whl file, just let me know which one and I can try it!

I'm still not convinced there is necessarily anything wrong you had to fix, the problem might have been my system from the start?

Now that I am more intentional about exactly what version of python I am using (3.11.2) and I'm using a venv, let me try an official release again:

pip install archive-pdf-tools

Hm, alas that one still failed, on I think the same error, #include "longintrepr.h".

I did get the install to work on my personal MacBook though -- which was on MacOS 13. I wonder if I upgraded this laptop to MacOS 13 if it would just work. (Sorry I will not be downgrading to MacOS 11 or 10!). Or if there's something else that differs between this laptop and my personal one where I think pip install archive-pdf-tools worked. I'm sorry, I don't have time this week to try every possible combination of everything (or to upgrade my laptop this week), but I can try a few more things if you'd like!

MerlijnWajer commented 1 year ago

If you can tell me the Python version you are using on MacOS 12? python --version will tell you. the cp3x corresponds to the CPython version.

MerlijnWajer commented 1 year ago

Ah, sorry, I just saw that you told me what version you are using. I don't think I build wheels for 3.11 yet, let me see if I can do that.

jrochkind commented 1 year ago

Thanks! This is all just me trying out demos, please know that whatever python version I am using is just what I happen to be using right now to try things out, it's not a commitment to using it forever or what have you!

That was just me installing the "latest" python because I had to pick one and it seemed like a good idea?

If you really need to build a wheel for every possible version of python (combined with OS etc!), that seems pretty untenable!

I can also go back to python 3.10 for the purpose of testing if it's easier for you! I don't totally underestand what we are testing! I didn't pick 3.11 with intention, i just installed the latest version thinking that was the thing to do!

MerlijnWajer commented 1 year ago

Yeah, it gets a little tedious, but 3.7 - 3.11 is not too bad. I'm almost ready to get a build for 3.11, but unfortunately a bug in lxml has had me pin specific versions on lxml for archive-hocr-tools and these are not available for 3.11, so I need to figure out how to make this work.

If you could give it a try on 3.10 if that is not too much work, that would be great. You would use archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl with that.

MerlijnWajer commented 1 year ago

(btw, I am running on the assumption that macosx_11_0 would work on 11.0+)

jrochkind commented 1 year ago

OK, thanks!

I had to start over in a new directory with a new venv, cause I didn't know how else to do it (not sure if there is another way to do it!)

Then... it seems to have installed!

It did warn:

DEPRECATION: lxml is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

It installed enough to run recode_pdf --version and get 1.5.3 anyway! (Took almost 3 seconds for it to be able to print the version number, I guess it just had to load a lot of code first, and this is expected).

MacOS 12.6.3, Python 3.10.10, Apple M1 Pro chip, archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl

MerlijnWajer commented 1 year ago

Great news, thanks for testing the MacOS ARM64 version on Python 3.10

I didn't raise the version further, so getting 1.5.3 makes sense for this test. I have tried to build a version for Python 3.11 here, but it doesn't depend on archive-hocr-tools, so you will have to install that with pip manually, if you'd be up for another test.

artifact.zip

MerlijnWajer commented 1 year ago

(From the above archive, you'd need archive_pdf_tools-1.5.3-cp311-cp311-macosx_11_0_arm64.whl)

jrochkind commented 1 year ago

Okay! in a venv using python 3.11.2. Still on a M1 Pro MacBook running MacOS 12.6.3.

pip install archive-hocr-tools
pip install --force-reinstall -U wheel_artifacts/archive_pdf_tools-1.5.3-cp311-cp311-macosx_11_0_arm64.whl

I'm afraid that did not install.

console output ``` clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -DCYTHON_CLINE_IN_TRACEBACK=0 -Isrc -Isrc/lxml/includes -I/Users/jrochkind/code/archive-pdf-tools-311/env/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c src/lxml/etree.c -o build/temp.macosx-12-arm64-cpython-311/src/lxml/etree.o -w -flat_namespace src/lxml/etree.c:261877:23: error: no member named 'exc_type' in 'struct _err_stackitem' while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) && ~~~~~~~~ ^ src/lxml/etree.c:261877:53: error: no member named 'exc_type' in 'struct _err_stackitem' while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) && ~~~~~~~~ ^ src/lxml/etree.c:261891:23: error: no member named 'exc_type' in 'struct _err_stackitem' *type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261893:21: error: no member named 'exc_traceback' in 'struct _err_stackitem' *tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261907:26: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261909:24: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261910:15: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = type; ~~~~~~~~ ^ src/lxml/etree.c:261912:15: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = tb; ~~~~~~~~ ^ src/lxml/etree.c:261994:30: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:261996:28: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:261997:19: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = local_type; ~~~~~~~~ ^ src/lxml/etree.c:261999:19: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = local_tb; ~~~~~~~~ ^ src/lxml/etree.c:262185:26: error: no member named 'exc_type' in 'struct _err_stackitem' tmp_type = exc_info->exc_type; ~~~~~~~~ ^ src/lxml/etree.c:262187:24: error: no member named 'exc_traceback' in 'struct _err_stackitem' tmp_tb = exc_info->exc_traceback; ~~~~~~~~ ^ src/lxml/etree.c:262188:15: error: no member named 'exc_type' in 'struct _err_stackitem' exc_info->exc_type = *type; ~~~~~~~~ ^ src/lxml/etree.c:262190:15: error: no member named 'exc_traceback' in 'struct _err_stackitem' exc_info->exc_traceback = *tb; ~~~~~~~~ ^ src/lxml/etree.c:264391:20: error: no member named 'exc_type' in 'struct _err_stackitem' t = exc_state->exc_type; ~~~~~~~~~ ^ src/lxml/etree.c:264393:21: error: no member named 'exc_traceback' in 'struct _err_stackitem' tb = exc_state->exc_traceback; ~~~~~~~~~ ^ src/lxml/etree.c:264394:16: error: no member named 'exc_type' in 'struct _err_stackitem' exc_state->exc_type = NULL; ~~~~~~~~~ ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Compile failed: command '/usr/bin/clang' failed with exit code 1 creating var creating var/folders creating var/folders/_1 creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T cc -I/usr/include/libxml2 -c /var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.c -o var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.o cc var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.o -lxml2 -o a.out error: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure × Encountered error while trying to install package. ╰─> lxml ```

(I think what I'm learning is it's best not to use the very latest python release maybe?)

MerlijnWajer commented 1 year ago

Thanks for testing. I will get Python 3.11.x installed on my laptop and see if with the latest lxml the grave bugs I was seeing are gone. If that is the case, then we increase the requirement for hocr tools and then we should be all set for 3.11.x.

Support wise, it's probably also a matter of this project not having that many users on Python 3.11. :)

MerlijnWajer commented 1 year ago

I just checked, with lxml 4.9.2 the bug is still there: https://bugs.launchpad.net/lxml/+bug/1970741 - I'll see what I can do, but meanwhile, yeah, probably better to use 3.10.

jrochkind commented 1 year ago

Hm, bug reported to lxml a year ago, there doesn't seem to be anyone in a hurry to fix it.

The bug on lxml doesn't mention python 3.11 specifically... is the issue that in order to use lxml on python 3.11, you need to use a newer version of lxml that exhibits the bug, while on 3.10 you can use an older version of lxml that does not?

This is a bit irritating indeed!

MerlijnWajer commented 1 year ago

That's right, there does not seem to be a lxml 4.6.5 Python 3.11, and all the new ones are broken currently.

MerlijnWajer commented 1 year ago

This problem should soon be resolved by the way, as I have moved archive-hocr-tools away from lxml entirely. So it simply won't be required anymore. See https://github.com/internetarchive/archive-hocr-tools/issues/5

With the next release of archive-pdf-tools, I'll require that specific version of archive-hocr-tools (or higher), and then we can close this issue.

MerlijnWajer commented 1 year ago

The latest 1.4.x branch and master now ought to work with Python 3.11 as well. Please give it go if you can.