Open abravalheri opened 6 months ago
It's important to remember that importlib.metadata
was provisional until Python 3.10, where the selectable entry points was introduced. If a project is relying on importlib.metadata
on Python 3.9 to supply third-party MetaPathFinder or MetadataPathFinders, I'm not surprised that it runs afoul of the described expectations.
My initial recommendation would be that third-party providers should depend on importlib_metadata; python_version<"3.10"
. I recognize, however, that it would be nice not to have that dependency, some projects can't have dependencies, and for importlib_metadata
in particular, having the dependency alters the user's environment.
I wonder if importlib metadata
shouldn't recommend that metadata providers prefer importlib_metadata
if present and only fall back to importlib.metadata
if it is not. That behavior would match the behavior observed by consumers of metadata.
If providers were to follow that recommendation, it would not only address the reported issue, but it would also obviate the need for #487.
Alternatively, importlib_metadata
could get more aggressive about replacing stdlib-based MetadataPathFinders. In addition to the current patching of the stdlib finder, it could find any third party providers and patch them or wrap them to replace importlib.metadata.Distribution
s and importlib.metadata._meta.PackageMetadata
s with their importlib_metadata
counterparts.
I wonder, too, about the long-term strategy for this project. Should it exist indefinitely? Should it be deprecated/retired and when?
Are there other options to consider?
I wonder if importlib metadata shouldn't recommend that metadata providers prefer importlib_metadata if present and only fall back to importlib.metadata if it is not. That behavior would match the behavior observed by consumers of metadata.
That sentiment does not seem to be generally shared (for example see thread starting in https://github.com/pypa/pyproject-hooks/pull/195#discussion_r1586868076). One of the main concerns is non-deterministic behaviour at runtime.
I wonder, too, about the long-term strategy for this project. Should it exist indefinitely? Should it be deprecated/retired and when?
I suppose the main advantage of keeping a separated project for importlib_metadata
is that it becomes easier to maintain (in terms of cadence of releases).
That is more interesting while the API does not reach stability. Once the API is considered stable, would it make sense to keep the project separated for agility in terms of bugfixes, or at that point the gains are considered minimum?
That sentiment does not seem to be generally shared (for example see thread starting in pypa/pyproject-hooks#195 (comment)). One of the main concerns is non-deterministic behaviour at runtime.
My objective with this recommendation was to reduce the number of variables making things more deterministic. As has been pointed out, this backport struggles because it's providing compatibility forward and backward and also because it's supporting two API surfaces (providers and consumers).
If we reduce the variability of providers, asking them to have a strategy for how to behave given an environment, that reduces the problem space substantially and puts the control in the hands of the consumer and/or environment builder. That is, if there were guidance for providers and they follow that guidance, they are no longer responsible to manage the complexity.
That is more interesting while the API does not reach stability. Once the API is considered stable, would it make sense to keep the project separated for agility in terms of bugfixes, or at that point the gains are considered minimum?
The problem with stability is that you have to choose between stability and adaptability. The API has been largely stable since Python 3.10, with essentially minor tweaks and shoring up the API limitations. I've encountered at least half a dozen cases, however, where users have wanted newer behavior on older Pythons, such as the improved import time performance or support for symlinks.
In addition to the main benefit of providing forward compatibility and the faster release cadence (preview of changes), this package provides test functionality that's not available in the stdlib (performance benchmarks, xfailed tests, integration tests, coverage checks, linting and formatting, type checks). The developer ergonomics here are so much better than in CPython (to the extent that I'm dreaming of a world where all (or many) stdlib modules are developed independently and then later incorporated into stdlib).
Alternatively,
importlib_metadata
could get more aggressive about replacing stdlib-based MetadataPathFinders. In addition to the current patching of the stdlib finder, it could find any third party providers and patch them or wrap them to replaceimportlib.metadata.Distribution
s andimportlib.metadata._meta.PackageMetadata
s with theirimportlib_metadata
counterparts.
The idea of monkey-patching providers sounds fraught with complexity and risks, so it's probably a non-starter.
I can imagine instead a scenario where at run time importlib_metadata
replaces any instances of importlib.metadata.Distribution
and importlib.metadata._meta.PackageMetadata
with their importlib_metadata
counterparts. No monkey-patching, just compatibility shims to do the replacement.
That way, providers don't need to do anything - they provide importlib.metadata
objects naturally, but importlib_metadata
consistently returns objects from its library. I'll explore that approach and see how clean or ugly it is.
After working on documentation for the proposed approach, I'm now reconsidering the strategy (of replacing instances).
I'm wondering if instead, both libraries should expose (compatible) Protocols for Distribution
, PackagePath
, and Message
objects. Then the importlib_metadata
and importlib.metadata
implementations would specify those protocols as the handled types, even though they may subclass from one library or the other.
That's similar to the approach abravalheri originally proposed in #487, but using more public-facing interfaces that will also be published downstream in CPython.
Another option could be to proceed with #505, addressing the needs of most consumers, and then deal with support for custom providers separately.
Or! We could extend #505 to include new protocols, thereby making it easier for custom providers to implement the requisite interfaces more simply alongside the newer warnings (while still getting the benefits of converting incompatible classes to compatible ones). Now I'm leaning that direction.
When 3rd-party
MetaPathFinder
are implemented usingimportlib.metadata
, the return types inimportlib_metadata
are not respected, which causes the API to behave in an unexpected way (with unexpected errors).This is an example similar to the one identified in https://github.com/pypa/pyproject-hooks/pull/195#issuecomment-2088695300 and https://github.com/pypa/setuptools/issues/4338:
The expected behaviour as per API documentation would be:
It seems that the origin of this problem is a little "lie" in the API definition:
Instead of
importlib_metadata.Distribution.discover(...) -> Iterable[importlib_metadata.Distribution]
what actually happens is:
importlib_metadata.Distribution.discover(...) -> Iterable[importlib_metadata.Distribution | importlib.metadata.Distribution]
and that propagates throughout the whole API.
I haven't tested, but there is potential for other internal errors too, if internally
importlib_metadata
is relying that the objects will have typeimportlib_metadata.Distribution
to call newer APIs.It is probably worthy to change the return type of
importlib_metadata.Distribution.discover(...)
toIterable[importlib_metadata.Distribution | importlib.metadata.Distribution]
and then run the type checkers on the lowest Python supported (I suppose Python 3.8), to see if everything is OK.It also means that consumers of
importlib_metadata
cannot rely on the newer APIs (unless they are sure that 3r-party packages installed in their environment are not usingimportlib.metadata
).