python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

Invalid metadata directory leads to errors #100666

Open jaraco opened 1 year ago

jaraco commented 1 year ago

FWIW, I noticed that I had two torch directories under /usr/local/lib/python3.10/dist-packages:

torch-1.13.1+cu116.dist-info  (the installed version)
torch-1.12.1+cu116.dist-info  (containing only an empty file REQUESTED)

importlib_metadata.version("torch") was retuning None in this scenario. I think it was looking at the directory which only had the REQUESTED file.

After I removed that torch-1.12.1+cu116.dist-info directory, it fixed the problem, i.e. now: importlib_metadata.version("torch") == '1.13.1+cu116'

Originally posted by @sswam in https://github.com/python/cpython/issues/91216#issuecomment-1355716323

jaraco commented 1 year ago

sswam, your issue is slightly different. #91216 is attempting to address whether PackageMetadata.__getitem__ should return None or raise an exception. Regardless of the choice made regarding this issue, the environment you had would not have emitted valid metadata. Your issue is similar to the one moved to #94181, except in your case, both metadata directories are in the same sys.path. In this case, it becomes very difficult if not impossible to detect reliably which metadata is intended (in this case for the "torch" package). I can imagine some ways that might improve the situation:

jaraco commented 7 months ago
  • Someone could implement an environment checker that could utilize importilb.metadata to analyze distributions and identify those that appear to be invalid.

The py -m importlib.metadata.diagnose (or py -m importlib_metadata.diagnose) attempts to provide some rudimentary diagnostics about the environment, beginning to address this concern. I'm not sure what it would report for a duplicate metadata like you've described.

Aha - so it will crash currently:

 draft @ pip install -t . torch --no-deps
Collecting torch
  Using cached torch-2.2.1-cp312-none-macosx_11_0_arm64.whl.metadata (25 kB)
Downloading torch-2.2.1-cp312-none-macosx_11_0_arm64.whl (59.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.7/59.7 MB 21.9 MB/s eta 0:00:00
Installing collected packages: torch
Successfully installed torch-2.2.1
 draft @ ls
bin                   functorch             torch                 torch-2.2.1.dist-info torchgen
 draft @ mkdir torch-2.2.0.dist-info
 draft @ pip-run importlib_metadata -- -m importlib_metadata.diagnose
Inspecting /Users/jaraco/draft
Found 2 packages: 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/pip-run-t038moo7/importlib_metadata/diagnose.py", line 21, in <module>
    run()
  File "/var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/pip-run-t038moo7/importlib_metadata/diagnose.py", line 17, in run
    inspect(path)
  File "/var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/pip-run-t038moo7/importlib_metadata/diagnose.py", line 12, in inspect
    print(', '.join(dist.name for dist in dists))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: sequence item 0: expected str instance, NoneType found

Probably that diagnostics routine could be improved to capture conditions like these.

jaraco commented 7 months ago
  • importlib.metadata could sort discovered distributions in reverse lexicographic order or reverse inferred version order such that newer versions take precedence. That wouldn't fix the issue if the invalid metadata is for a newer package, but it would reduce the likelihood that an upgrade could lead to this scenario.

I'm thinking at least reversed lexicographic order (opposite the order in which it would typically appear in a listdir operation), would be preferable.