python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
127 stars 81 forks source link

Better identification of broken Distribution objects #508

Open henryiii opened 2 weeks ago

henryiii commented 2 weeks ago

Currently, I'm getting an error on Python 3.8 and 3.9 in https://github.com/pypa/build/pull/820:

docker run --rm -it python:3.9 bash
pip install tox
git clone https://github.com/pypa/build
cd build
tox -e py39 -- -k test_metadata_path_no_prepare -v
...
File "/build/.tox/py39/lib/python3.9/site-packages/importlib_metadata/compat/py39.py", line 23, in normalized_name
    return Prepared.normalize(getattr(dist, "name", None) or dist.metadata['Name'])
  File "/build/.tox/py39/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 889, in normalize
    return re.sub(r"[-_.]+", "-", name).lower().replace('-', '_')
  File "/usr/local/lib/python3.9/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

The problem is dist.__dict__={'_path': PosixPath('/build/tests/packages/test-no-prepare/test_no_prepare.egg-info')}. I think this is tripping up on tests/packages/test-no-prepare/test_no_prepare.egg-info/ and the local backend. But I don't know what updated to cause this to start happening.

Regardless of the solution, though, I think the handling here for a missing name should be better, there wasn't any useful info in the error message to tell me about the dist that was failing. I had to add print(f"{dist.__dict__=}") to see it. Or maybe the normalize name could return None, and let the failure happen elsewhere.

jaraco commented 1 week ago

I believe this issue is fixed in importlib_metadata >= 8, where a missing Name key will raise a KeyError.

Add another line to the repro, I am able to reproduce the issue.

git checkout a73ecbdf16d8a8abb44cbbe95e9ab5f8f2a7c9b9

And with the issue reproduced, I still see the TypeError under importlib_metadata 8.5.0.

jaraco commented 1 week ago

Unfortunately, the repro doesn't provide a very good understanding of the factors that lead to the missed expectation, as pyproject-hooks encapsulates all of the behavior. I don't know a good way to debug that behavior.

I did find that importlib_metadata is in fact raising the appropriate KeyError in the test environment:

 build a73ecbd 🐚 .tox/py39/bin/python -q
>>> import importlib_metadata
>>> dist, = importlib_metadata.distributions(path=['tests/packages/test-no-prepare'])
>>> dist.name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jaraco/code/pypa/build/.tox/py39/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 504, in name
    return self.metadata['Name']
  File "/Users/jaraco/code/pypa/build/.tox/py39/lib/python3.9/site-packages/importlib_metadata/_adapters.py", line 54, in __getitem__
    raise KeyError(item)
KeyError: 'Name'
>>> dist.metadata['Name']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jaraco/code/pypa/build/.tox/py39/lib/python3.9/site-packages/importlib_metadata/_adapters.py", line 54, in __getitem__
    raise KeyError(item)
KeyError: 'Name'
jaraco commented 1 week ago

Oh, right. I remember now - the issue stems from some other provider producing importlib.metadata.Distribution objects... and it's those objects that will return None for missing keys (violating the protocol).

In #486, we're working on addressing that concern (where older interfaces are inadequate for newer behaviors).

I'm still not sure that issue would address the concern reported here, which is that the implicated distribution (bad metadata) isn't apparent. Even after a KeyError is (more correctly) raised, it won't include which Dist was impacted. There is, however, there is an importlib_metadata.diagnose script that's designed to help diagnose a broken environment. It doesn't help much for a case like the one reported here, where the environment is broken inside an opaque build tool.

We could as you suggest wrap the normalize_name call in a special exception that includes the details about the affected dist. And while that might address this immediate need, it implies that basically any unhandled exception in any operation on a dist should somehow wrap it to reveal which dist is relevant. I don't think that's viable.

Or maybe the normalize name could return None, and let the failure happen elsewhere.

I'm disinclined to go that route. A lot of the interfaces are working from the assumption that the dists are valid. I'd like to continue to rely on that assumption and not create another space for "null" distributions (or names or similar).


Another option could be to emit a warning whenever a broken distribution is encountered. Maybe it's opt-in so the warnings are only emitted on demand (for performance reasons).