py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.31k stars 1.41k forks source link

Update the Makefile for the current version of Python and simplify testing #2848

Open kaos-ocs opened 1 month ago

kaos-ocs commented 1 month ago

The Makefile sets pyenv to 3.7.2 which is no longer supported. Update the Makefile to a supported version and make it easier to test against different versions of Python; I will raise the PR to do this.

Question: What should the default versions of the OS and Python be? Ubuntu 20.04 ships with 3.8.2, 22.04 ships with 3.12.3. I can regenerate the requirements files with any required versions of the OS and Python.

As a first time tester, I found the learning curve for setting up the test environment to be quite steep, with instructions scattered across various doc files. Makefiles are supposed to simplify this process so add targets to consistently set the environment for testing. and update the docs accordingly.

Updating the requirements/*txt files will also fix the conflicting requirements for docutils (dev.txt==0.20.1, doc.txt==0.17.1) and importlib-metadata (ci.txt==4.2.0, dev/docs.txt==6.7.0).

stefan6419846 commented 1 month ago

Thanks for raising this issue.

As for some other files in this repository: They might be historical and not actively used by us as the active maintainers. I am not completely sure about the Makefile though, but I have to admit that I never used it in the past. As the referenced Python version in the Makefile is only of interest for updating the requirements with this specific target, I do not really see an issue with keeping it.

We currently have no clear indication on what the defaults for OS and Python should be. Ubuntu 20.04 is mainly used because there has not been the need to upgrade to Ubuntu 22.04 (although we might decide to do so in the near future with Ubuntu 20.04 becoming EOL in April 2025). For the Python support policy, we had some discussions as well, but did not yet decide on the future - at the moment, we support Python versions up to about a year after their EOL, but we might adapt this as well.

The testing docs should (in theory) cover the most relevant parts of this already: https://pypdf.readthedocs.io/en/stable/dev/testing.html If you observe any outdated or missing information, feel free to submit a corresponding PR.

The pinned requirements are mostly used for CI - there should be no need to pin them in your testing environment as well. (I have some standalone mechanism in place which should detect most of the breaking changes once new package versions are released - at the moment this mostly affects updates for mypy and ruff.) The docs.txt file is only suitable for recent Python versions (Python 3.11 and Python 3.12) as our docs are checked/built on the most recent Python versions instead of older ones.

pubpub-zz commented 1 month ago

as python 3.7 support is dropped, I've changed version to 3.10(in #2851) but the best would be to recommend to developpers to select their version as long as it respects the minimum version. As @stefan6419846, I'm not using this makefile neither.

kaos-ocs commented 1 month ago

For discussion, too soon for a PR. My main question is "who is this Makefile intended for?". If it just for new developers then it can be a lot simpler, no need for regenerate, release, mutmut etc. If it is for maintainers as well as developers then it needs more work such as installing mutmut and other packages.

The comments at the start also need to be added to docs, but that is for later.

Diff is against pubpub-zz/REL5.0.0.

commit d2a0576ff4b8ac2cc733074c0035e6d6089bc402
Author: Keith Owens <kaos.ocs@gmail.com>
Date:   Sun Sep 15 13:57:06 2024 +1000

    ENH: Make the Makefile useful

    Ensure that all Makefile targets are run using pyenv.

    Allow testing under different Python versions.

    Automatically select the correct ci file for input and output.

    Add new targets testing_env (set up the testing environment, only needed
    once) and testing_all (set up the environment then run the tests).

    Change the default target from maint to testing_all so a 'make' with no
    arguments does something useful.

    Split the maint target into just the commands for maintenance and a
    separate target to regenerate the requirements files.  This avoids
    changing the requirements when making unrelated changes.

    Add a TIMEOUT parameter; tests/test_images.py::test_image_new_property
    can run for more than 60 seconds with coverage on.

    Install pyclean so 'make clean' works.

    Closes #2848

diff --git a/Makefile b/Makefile
index 1c0117ba..59975c96 100644
--- a/Makefile
+++ b/Makefile
@@ -1,40 +1,90 @@
-maint:
-   pyenv local 3.10.5
+# This Makefile requires on an existing pyenv environment containing the
+# version of python that you want to use.  Before using this Makefile, you need
+# to:
+#
+# 1) Install pyenv using your distribution's package manager.
+# 2) Set up your shell for pyenv.  Run "pyenv init" and follow the
+#    instructions, including restarting your shell.
+# 3) Ensure you have the required libraries to build python.  See
+#    https://devguide.python.org/getting-started/setup-building/index.html#build-dependencies
+#    "pyenv install" will still without those libraries but some of the pypdf
+#    code will fail if Python is not built correctly.
+# 4) Run "pyenv install <version>" and check that it does not complain about
+#    missing libraries.
+#
+# The default Python version is currently 3.10.5.  To use other versions, set
+# environment variable PYTHON_VERSION to your desired version before running
+# make.  For example: PYTHON_VERSION=3.12.3 make <something>
+
+PYTHON_VERSION ?= 3.10.5
+
+.DEFAULT_GOAL = testing_all
+
+# Need a different ci.txt for Python 3.11 onwards.
+
+CI_TXT = python -c 'import sys; major, minor = sys.version_info[:2]; print("ci" if major == 3 and minor < 11 else "ci-3.11")'
+
+maint: environment
    pre-commit autoupdate
-   pip-compile -U requirements/ci.in
-   pip-compile -U requirements/docs.in
+   pip install --upgrade pip
+
+regenerate: environment
+   pip install $(shell fgrep pip-tools= requirements/dev.txt)  # in case pip-compile is not installed yet
+   pip-compile -U --output-file=requirements/$(shell $(CI_TXT)).txt requirements/ci.in
+   if [ "$(shell $(CI_TXT))" != "ci" ]; then pip-compile -U requirements/docs.in; fi # Only regenerate docs with Python 3.11 onwards
    pip-compile -U requirements/dev.in

-release:
+release: environment
    python make_release.py
    git commit -eF RELEASE_COMMIT_MSG.md

-upload:
+upload: environment
    make clean
    flit publish

-clean:
+# environment is required because pyclean may not be installed globally.
+
+clean: environment
+   pip install pyclean # May not be installed yet
    pyclean .
    rm -rf tests/__pycache__ pypdf/__pycache__ Image9.png htmlcov docs/_build dist dont_commit_merged.pdf dont_commit_writer.pdf pypdf.egg-info pypdf_pdfLocation.txt .pytest_cache .mypy_cache .benchmarks

-test:
-   pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=60 pypdf
+# Set up the testing environment.  'make testing_env' at least once before
+# running any tests.
+
+testing_env: environment maint
+   git submodule update --init
+   pip install --upgrade pip
+   pip install -r requirements/dev.txt
+   pip install -r requirements/$(shell $(CI_TXT)).txt
+   python -c "from tests import download_test_pdfs; download_test_pdfs()"
+
+test: environment
+   pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=$(or $(TIMEOUT),60) pypdf

-testtype:
-   pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=30 --typeguard-packages=pypdf
+testing_all: testing_env test

-mutation-test:
+testtype: environment
+   pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=$(or $(TIMEOUT),30) --typeguard-packages=pypdf
+
+mutation-test: environment
    mutmut run

-mutation-results:
+mutation-results: environment
    mutmut junitxml --suspicious-policy=ignore --untested-policy=ignore > mutmut-results.xml
    junit2html mutmut-results.xml mutmut-results.html

-benchmark:
+benchmark: environment
    pytest tests/bench.py

-mypy:
+mypy: environment
    mypy pypdf --ignore-missing-imports --check-untyped --strict

-pylint:
+pylint: environment
    pylint pypdf
+
+# Set the environment for all other targets.  No need to specify this target,
+# it is automatically included where required.
+
+environment:
+   pyenv local $(PYTHON_VERSION)
stefan6419846 commented 1 month ago

As far as I am aware, the Makefile is not referenced anywhere - it just lives in this repository. Thus I would not expect every new developer to start using it, but instead rely on our developer docs.

If we are able to lower the burden for new contributors by providing a Makefile, this might be an option. I would avoid the pyenv stuff in this case as well. By the way: The pylint target is outdated - we are only using ruff for linting nowadays.

kaos-ocs commented 1 month ago

Plan B. Treat the Makefile as a starter for new developers and add it to the docs. Remove all the maintainer code, keep just a few make targets such as:

Installing the required packages as a user runs up against PEP 668, you need some form of virtual environment or sudo (not a good idea for testing). venv works for the system version of Python, but only for that version. pyenv is the best (only?) option for testing against multiple versions of Python3. The main downside with pyenv is the need to build Python first. I could write the Makefile so it looked for the required Python version in pyenv first, with a fallback to -m venv and the system version of Python.

stefan6419846 commented 1 month ago

venv works for the system version of Python, but only for that version.

Is this correct? I have never used venv directly, but always virtualenv, and it supports passing a specific Python version or interpreter path - at least for OpenSUSE where I actually use different Python versions.

kaos-ocs commented 1 month ago

venv works for the system version of Python, but only for that version.

Is this correct? I have never used venv directly, but always virtualenv, and it supports passing a specific Python version or interpreter path - at least for OpenSUSE where I actually use different Python versions.

True, venv and virtualenv can handle multiple versions of python. But AFAICT, those versions have to be installed in the system first. On Ubuntu 22.04, the only available version in the repository is python 3.12.3 so there is no way to install other versions in the system libraries. That means building your own version of python and installing it as a user, which is what pyenv install does.