Closed vyasr closed 1 year ago
Kerchunk has pinned back numcodecs due to CI failing with current entryponts implementations (e.g., https://github.com/fsspec/kerchunk/actions/runs/6436907706/job/17481181465?pr=372#step:5:58 ). Please advise when fixed, or if there's anything I can do.
Yeah we had to do the same in RAPIDS unfortunately
Reading over Vyas' nice writeup above would help
Reviewing the PR: https://github.com/zarr-developers/numcodecs/pull/475
Suggesting a test we can use to ensure we don't reintroduce this accidentally in the future
Minimal, reproducible code sample, a copy-pastable example if possible
I don't think it's worth trying to create an MRE for this situation, please see below.
Problem description
This has turned out to be a much more subtle issue than I first thought. The problem looks like it's due to a combination of two factors:
EntryPoint
object changing over time, andimportlib.metadata
byimportlib_metadata
To start with, we need to look at the evolution of
importlib_metadata
. Untilimportlib_metadata
version 4.13.0, theEntryPoints
class had an__iter__
method. This method was deprecated much earlier (sometime around 3.x) but was not removed until version 5.0.0. Meanwhile, version 4.8.1 introduced support forEntryPoints.__getitem__
via theDeprecatedTuple
class, functionality which had been supported in previous versions but had been removed.Now, when we roll around to version 5.0.0, the removal of
EntryPoints.__iter__
meant that iteration over anEntryPoints
object instead fell back to calling__getitem__
(this is how iterables are designed in Python). The problem is that the__getitem__
implementation would in turn call_key
, which would return a tuple(self.name, self.value, self.group)
. This is a problem because the old__iter__
implementation instead returnediter((self.name, self))
.The relevant code in numcodecs for Python 3.9 is assuming that we'll get the second interface, such that given a tuple of
EntryPoint
objects we'll be able to process it as key-value pairs of the form(name, EntryPoint)
, which works with the old explicit__iter__
API but not the new__getitem__
implicit iteration API.A major part of the complexity comes from the fact that
importlib_metadata
was provisionally added to the standard library asimportlib.metadata
in Python 3.8, then made a full part of the standard in 3.10. As of Python 3.9, the version of the code still had anEntryPoints
object that defined__iter__
, i.e. it was matchingimportlib_metadata
<5.0.0. Therefore, the code in numcodecs seems like it should work. However,importlib_metadata
actually monkey-patches parts of importlib.metadata:Note that on the final line we go from
importlib.metadata
toimportlib_metadata
. This is the root of the problem: if you have a newer version ofimportlib_metadata
installed in Python 3.9, you start seeingEntryPoints
objects with the new iteration protocol instead of the old one. Therefore, in the Python 3.9 scenario numcodecs needs to handle both possibilities.The simplest solution is to avoid relying on the iteration protocol altogether and simply construct the appropriate update argument manually. This is what I now do in #475.
Version and installation information
Please provide the following:
numcodecs.__version__
: 0.12.0Also, if you think it might be relevant, please provide the output from
pip list
orconda list
depending on which was used to install NumCodecs.