[BUG] Please make trove classifiers validation optional

eli-schwartz commented 3 months ago

setuptools version

from git, also 70.0.0

Python version

Python 3.12

OS

Gentoo Linux

Additional environment information

No response

Description

I tried building a wheel for a package that uses a trove classifier in the very latest version of the trove_classifiers package. Setuptools detected that trove_classifiers was installed, and insisted on validating it.

Looking at the code which handles this, it offers an escape hatch only to "avoid hitting the network" if trove_classifiers is NOT installed: https://github.com/pypa/setuptools/blob/34f9518ef5b50d546893fc29386d4d11866dd9db/setuptools/config/_validate_pyproject/formats.py#L156-L214

Expected behavior

A way to disable the unwanted functionality without first uninstalling other packages. For example, instead of $VALIDATE_PYPROJECT_NO_NETWORK, I would like to see $VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS.

How to Reproduce

git clone https://github.com/urschrei/pyzotero
python -m build -nwx

Output

* Building wheel...
configuration error: `project.classifiers[10]` must be trove-classifier
DESCRIPTION:
    `PyPI classifier <https://pypi.org/classifiers/>`_.

GIVEN VALUE:
    "License :: OSI Approved :: Blue Oak Model License (BlueOak-1.0.0)"

OFFENDING RULE: 'format'

DEFINITION:
    {
        "type": "string",
        "format": "trove-classifier"
    }

For more details about `format` see
https://validate-pyproject.readthedocs.io/en/latest/api/validate_pyproject.formats.html

Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in <module>
    main()
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 357, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 271, in build_wheel
    return _build_backend().build_wheel(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 410, in build_wheel
    return self._build_with_temp_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
    self.run_setup()
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
    exec(code, locals())
  File "<string>", line 12, in <module>
  File "/usr/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 158, in setup
    dist.parse_config_files()
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 632, in parse_config_files
    pyprojecttoml.apply_configuration(self, filename, ignore_option_errors)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 68, in apply_configuration
    config = read_configuration(filepath, True, ignore_option_errors, dist)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 133, in read_configuration
    validate(subset, filepath)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 57, in validate
    raise ValueError(f"{error}\n{summary}") from None
ValueError: invalid pyproject.toml config: `project.classifiers[10]`.
configuration error: `project.classifiers[10]` must be trove-classifier

ERROR Backend subprocess exited when trying to invoke build_wheel

eli-schwartz commented 3 months ago

mgorny commented 3 months ago

Making it into a warning rather than error would also work for us.

jaraco commented 2 months ago

After reviewing the code, I can add some detail.

The escape hatch was created to avoid network access for the fallback behavior when the trove_classifiers package isn't present. It assumes that if that package is present, it contains valid data and should be used to do the validation. The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

The suggestion to make it a warning instead of an error seems reasonable, but my guess is that most users will miss the warning and only encounter it when attempting to upload to an index anyway, suggesting maybe the validation should be removed altogether.

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

@abravalheri What do you think of the proposal and the alternatives?

eli-schwartz commented 2 months ago

The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

The fact that it's stale in this case indicates we never intended trove_classifiers to be valid at all, but we build without virtualenv isolation.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

It's not going to resolve itself over time, because due to dependency graph ordering there's no real reason:

when both trove_classifiers and pyzotero are installed globally at the distro level
when both have pending updates

for trove_classifiers to end up queued for installation first. Instead, updating fails, and continues to fail because every time a system update is attempted, it fails again.

If setuptools removed its validation entirely, or made it only use the network, that would equally solve the gentoo use case though. :) We already disable networking anyway.

Of course, one might ask, why not just build in an isolated environment without trove_classifiers? That's recommended by PyPA via virtualenvs. But distros generally don't do this, because they already have both network isolation (cannot install build-requires with pip) and clean build isolation via spinning up temporary containers with just distro-packaged dependencies installed. And distros want to use their own packaged dependencies for various reasons.

... but Gentoo (can, and usually does) build from source on every user's machine, and spinning up containers for every package is a hassle, slow, implies nontrivial setup, and generally is impractical. Distros like Debian and Fedora have complex distributed systems to do this and reset to a clean slate in between each build, so it doesn't matter if it takes 15 minutes to build each pure-python wheel because they have tends of buildbots building thousands of packages, and each one only has to get built once.

And most other cases of additional packages installed in a build environment aren't a problem. ;) Build isolation primarily solves the problem of people forgetting to specify everything the build depends on because they already have it installed.

So gentoo usually does what those other distros also do when individual users want to build a .deb or .rpm, and builds on the live system, and relies on --disable-foo style options where applicable (e.g. C/C++ environments, where there is a culture of build options, unlike Python where build options largely aren't a thing unless you're building data science packages that use both C++ and Fortran and cannot be built by pip because pip doesn't know how to install OpenBLAS).

And so we end up here, asking for, well, a --disable-foo style option to disable trove classifier checking. :D

mgorny commented 2 months ago

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

Yeah, I've noticed that earlier and I was just about to open a pull request there. Good that I've checked mail first, let's see what resolution is preferred.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

Do you mean one that fetches data straight from PyPI? As long as we can force-disable Internet access, that would work for us.

abravalheri commented 2 months ago

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

I would love to take this approach, but the PyPI Web page for listing classifiers is not completely official. I know that flit uses it... but the standards mention the package as the only true source of truth... So I am not ready to do that move unless there is commitment in supporting the HTTP api for classifiers in PyPI.

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

eli-schwartz commented 2 months ago

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

I'm pretty sure I was quite clear that the problem does not naturally go away ever, because the solver gets into a stuck state and doesn't get unstuck until someone manually kicks it -- because it will keep trying to install the failing package before trove_classifiers. That usually means patching the setuptools distro package to have a hard dependency on trove_classifiers >= $LATEST_RELEASE which is kind of annoying.

It would be much better if we could unstick things automatically by telling setuptools to stop performing validations we don't want. We aren't uploading to PyPI.

It's confusing anyway because there's absolutely no reason people shouldn't be allowed to use whatever trove classifiers they like, even unofficial ones, on private self-run indexes.

eli-schwartz commented 2 months ago

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

Just want to reiterate here that adding trove-classifiers as build-requires for arbitrary packages throughout the python ecosystem is not a very comfortable solution in the slightest.

Especially because trove-classifiers may not need to be installed at all -- and in fact the only reason I have it on my system is because hatchling required it.

It is terrible UX to have pyzotero require an additional build-requires, solely because if it doesn't have that build-requires then installing hatchling will result in setuptools erroring out. It's not even the kind of bug that anyone will notice for a while.

abravalheri commented 2 months ago

Hi @eli-schwartz, thank you very much for the reply. I understand your concerns and appreciate your feedback.

Please note that I am not suggesting to add trove-classifiers as a build-requirements to all packages throughout the ecosystem, but rather this package in particular. I don't imagine that the majority of the packages on the ecosystem will require the very latest trove-classifiers.

Also note that when I suggested that the "problem eventually goes away" I was referring to "greenfield" deployments in the future when the trove-classifiers in the repositories that gets pulled by default is already new enough (at least this is my impression). I do agree that for a particular machine that already have the packages installed, it does not go away and require manual intervention.

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem[^1]... It is perfectly fine for OS-maintainers to attempt to optimise builds, but I think that in this case when the lack of build-isolation is directly involved in a bug, it would be important to explore other workarounds (like the inclusion in this particular package of trove-classifiers as a build dependency to force a particular installation order; or removing the problematic classifier in the package metadata).

I aim to keep the codebase as simple as possible, which is why I’m hesitant to add yet more knobs and levers. However, one potential solution could be introducing an environment variable like SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to skip the validations all together in https://github.com/pypa/setuptools/blob/main/setuptools/config/pyprojecttoml.py. In general, this would be risky, as it bypasses important checks that reduce the set of accepted inputs; but for classifiers, it should be fine. The responsibility would fall on the developer/user that sets this environment variable to ensure the overall configuration is valid.

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

[^1]: Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins.

eli-schwartz commented 2 months ago

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem1...

Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins. ↩

This is indeed a long-standing flaw in the python ecosystem, yes. :)

Automagically doing the wrong thing based on unpredictable environment inputs is bad, and everyone agrees on that. Due to several decades of legacy baggage, it is difficult to solve this for setuptools because the automagic environment inputs serve multiple roles including extensibility by way of entrypoint-based plugins.

Build isolation is not, and never has been, a solution to this engineering problem. It's a way to avoid the side effects in a scenario where the design constraints prevent a solution from being achieved. It is also, largely, a problem that only affects setuptools and doesn't affect other build backends which take a more structural approach to defining builds.

You may wish to take note here of the wording for PEP 517. It describes build isolation and marks it as an RFC 2119 compliant "SHOULD" feature, with two PEP rationales: 1) that it permits building multiple incompatible packages whose build requirements are mutually exclusive (and calls out pinned versions as the probable cause) 2) that it provides the hygienic ability to catch scenarios where people have forgotten to list all their build requirements

It doesn't mention the possibility of additional environment packages modifying your own build requirements, indicating that this isn't a motivational concern behind the recommendation to use build isolation.

eli-schwartz commented 2 months ago

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

Just for clarity, my mention of --disable-foo style options was intended as an explicit way to be evocative of GNU autoconf, not because I specifically believe setuptools should be configured via CLI flags instead of environment variables.

I suspect environment variables is ultimately easier for setuptools, if only because it doesn't require passing state around through multiple layers of "execute the setup.py script together with a phase to run".

The thing that I care about is the Autoconf model where functionality is expected to have some form of on/off switch rather than unconditionally probe whether an import is available.

pypa / setuptools