python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
128 stars 82 forks source link

API incompatibility with `importlib.metadata` (or at least the API is not type-safe?) #486

Open abravalheri opened 6 months ago

abravalheri commented 6 months ago

When 3rd-party MetaPathFinder are implemented using importlib.metadata, the return types in importlib_metadata are not respected, which causes the API to behave in an unexpected way (with unexpected errors).

This is an example similar to the one identified in https://github.com/pypa/pyproject-hooks/pull/195#issuecomment-2088695300 and https://github.com/pypa/setuptools/issues/4338:

mkdir -p /tmp/stash/private_path/_test-0.0.1.dist-info
cat <<EOF > /tmp/stash/private_path/_test-0.0.1.dist-info/METADATA
Name: _test
Version: 0.0.1
EOF

cat <<EOF > /tmp/stash/private_path/_test-0.0.1.dist-info/entry_points.txt
[_test.importlib_metadata]
hello = world
EOF

cat <<EOF > /tmp/stash/install_finder.py
import sys
from importlib.machinery import PathFinder
from importlib.metadata import DistributionFinder, MetadataPathFinder

class _ExampleFinder:
    def __init__(self, private_path):
        self.private_path = private_path

    def find_spec(self, fullname, _path, _target=None):
        if "." in fullname:
            # Rely on importlib to find nested modules based on parent's path
            return None

        # Ignore other items in _path or sys.path and use private_path instead:
        return PathFinder.find_spec(fullname, path=self.private_path)

    def find_distributions(self, context=None):
        context = DistributionFinder.Context(path=self.private_path)
        return MetadataPathFinder.find_distributions(context=context)

sys.meta_path.insert(0, _ExampleFinder(["/tmp/stash/private_path"]))
EOF

cd /tmp/stash/
python3.8 -m venv .venv
.venv/bin/python -m pip install -U importlib-metadata
.venv/bin/python
>>> import install_finder
>>> from importlib_metadata import distribution
>>> distribution("_test").entry_points.select(group="_test.importlib_metadata")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'select'

The expected behaviour as per API documentation would be:

>>> distribution("_test").entry_points.select(group="_test.importlib_metadata")
EntryPoints((EntryPoint(name='hello', value='world', group='_test.importlib_metadata'),))

It seems that the origin of this problem is a little "lie" in the API definition:

Instead of

importlib_metadata.Distribution.discover(...) -> Iterable[importlib_metadata.Distribution]

what actually happens is:

importlib_metadata.Distribution.discover(...) -> Iterable[importlib_metadata.Distribution | importlib.metadata.Distribution]

and that propagates throughout the whole API.

I haven't tested, but there is potential for other internal errors too, if internally importlib_metadata is relying that the objects will have type importlib_metadata.Distribution to call newer APIs.

It is probably worthy to change the return type of importlib_metadata.Distribution.discover(...) to Iterable[importlib_metadata.Distribution | importlib.metadata.Distribution] and then run the type checkers on the lowest Python supported (I suppose Python 3.8), to see if everything is OK.

It also means that consumers of importlib_metadata cannot rely on the newer APIs (unless they are sure that 3r-party packages installed in their environment are not using importlib.metadata).

jaraco commented 5 months ago

It's important to remember that importlib.metadata was provisional until Python 3.10, where the selectable entry points was introduced. If a project is relying on importlib.metadata on Python 3.9 to supply third-party MetaPathFinder or MetadataPathFinders, I'm not surprised that it runs afoul of the described expectations.

My initial recommendation would be that third-party providers should depend on importlib_metadata; python_version<"3.10". I recognize, however, that it would be nice not to have that dependency, some projects can't have dependencies, and for importlib_metadata in particular, having the dependency alters the user's environment.

I wonder if importlib metadata shouldn't recommend that metadata providers prefer importlib_metadata if present and only fall back to importlib.metadata if it is not. That behavior would match the behavior observed by consumers of metadata.

If providers were to follow that recommendation, it would not only address the reported issue, but it would also obviate the need for #487.

Alternatively, importlib_metadata could get more aggressive about replacing stdlib-based MetadataPathFinders. In addition to the current patching of the stdlib finder, it could find any third party providers and patch them or wrap them to replace importlib.metadata.Distributions and importlib.metadata._meta.PackageMetadatas with their importlib_metadata counterparts.

I wonder, too, about the long-term strategy for this project. Should it exist indefinitely? Should it be deprecated/retired and when?

Are there other options to consider?

abravalheri commented 5 months ago

I wonder if importlib metadata shouldn't recommend that metadata providers prefer importlib_metadata if present and only fall back to importlib.metadata if it is not. That behavior would match the behavior observed by consumers of metadata.

That sentiment does not seem to be generally shared (for example see thread starting in https://github.com/pypa/pyproject-hooks/pull/195#discussion_r1586868076). One of the main concerns is non-deterministic behaviour at runtime.

I wonder, too, about the long-term strategy for this project. Should it exist indefinitely? Should it be deprecated/retired and when?

I suppose the main advantage of keeping a separated project for importlib_metadata is that it becomes easier to maintain (in terms of cadence of releases).

That is more interesting while the API does not reach stability. Once the API is considered stable, would it make sense to keep the project separated for agility in terms of bugfixes, or at that point the gains are considered minimum?

jaraco commented 5 months ago

That sentiment does not seem to be generally shared (for example see thread starting in pypa/pyproject-hooks#195 (comment)). One of the main concerns is non-deterministic behaviour at runtime.

My objective with this recommendation was to reduce the number of variables making things more deterministic. As has been pointed out, this backport struggles because it's providing compatibility forward and backward and also because it's supporting two API surfaces (providers and consumers).

If we reduce the variability of providers, asking them to have a strategy for how to behave given an environment, that reduces the problem space substantially and puts the control in the hands of the consumer and/or environment builder. That is, if there were guidance for providers and they follow that guidance, they are no longer responsible to manage the complexity.

That is more interesting while the API does not reach stability. Once the API is considered stable, would it make sense to keep the project separated for agility in terms of bugfixes, or at that point the gains are considered minimum?

The problem with stability is that you have to choose between stability and adaptability. The API has been largely stable since Python 3.10, with essentially minor tweaks and shoring up the API limitations. I've encountered at least half a dozen cases, however, where users have wanted newer behavior on older Pythons, such as the improved import time performance or support for symlinks.

In addition to the main benefit of providing forward compatibility and the faster release cadence (preview of changes), this package provides test functionality that's not available in the stdlib (performance benchmarks, xfailed tests, integration tests, coverage checks, linting and formatting, type checks). The developer ergonomics here are so much better than in CPython (to the extent that I'm dreaming of a world where all (or many) stdlib modules are developed independently and then later incorporated into stdlib).

jaraco commented 2 months ago

Alternatively, importlib_metadata could get more aggressive about replacing stdlib-based MetadataPathFinders. In addition to the current patching of the stdlib finder, it could find any third party providers and patch them or wrap them to replace importlib.metadata.Distributions and importlib.metadata._meta.PackageMetadatas with their importlib_metadata counterparts.

The idea of monkey-patching providers sounds fraught with complexity and risks, so it's probably a non-starter.

I can imagine instead a scenario where at run time importlib_metadata replaces any instances of importlib.metadata.Distribution and importlib.metadata._meta.PackageMetadata with their importlib_metadata counterparts. No monkey-patching, just compatibility shims to do the replacement.

That way, providers don't need to do anything - they provide importlib.metadata objects naturally, but importlib_metadata consistently returns objects from its library. I'll explore that approach and see how clean or ugly it is.

jaraco commented 2 months ago

After working on documentation for the proposed approach, I'm now reconsidering the strategy (of replacing instances).

I'm wondering if instead, both libraries should expose (compatible) Protocols for Distribution, PackagePath, and Message objects. Then the importlib_metadata and importlib.metadata implementations would specify those protocols as the handled types, even though they may subclass from one library or the other.

That's similar to the approach abravalheri originally proposed in #487, but using more public-facing interfaces that will also be published downstream in CPython.

Another option could be to proceed with #505, addressing the needs of most consumers, and then deal with support for custom providers separately.

Or! We could extend #505 to include new protocols, thereby making it easier for custom providers to implement the requisite interfaces more simply alongside the newer warnings (while still getting the benefits of converting incompatible classes to compatible ones). Now I'm leaning that direction.