Closed reynoldsnlp closed 1 year ago
Right, as the current implementation runs into the problem that having the PyPI distribution package name be the same as the top level Python import package is only a common convention, rather than a hard requirement, so while there a lot of cases that it will correctly warn about, there are also quite a few that it will miss (e.g. while you install Pillow
, you actually import from PIL
, since Pillow
was originally designed to serve as a drop-in replacement for an older project using the latter name).
When reworking this, there's also a smaller tweak that would be appropriate, which is to print the related warning messages to stderr
rather than stdout
(there's relatively few cases where that will make a practical difference, but stderr
is the more suitable target stream)
@ncoghlan What do you mean "read the RECORD entries for any just installed modules"?
Also, source_from_cache
isn't as straightforward to use as I was hoping. I ran into the following error, which I assume has to do with PEP 3147 or 488: https://docs.python.org/3/library/importlib.html#importlib.util.source_from_cache. Help?
>>> sys.modules['pip']
<module 'pip' from '/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/__init__.py'>
>>> sys.modules['pip'].__spec__.origin
'/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/__init__.py'
>>> source_from_cache(sys.modules['pip'].__spec__.origin)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen importlib._bootstrap_external>", line 324, in source_from_cache
ValueError: __pycache__ not bottom-level directory in '/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/__init__.py'
The ValueError
is reporting that the path you submitted isn't a valid __pycache__
entry, so the function can't infer an original source location from it. To see what the function is expecting as input, run the path you have there through importlib.util.cache_from_source
:
>>> from importlib.util import cache_from_source, source_from_cache
>>> pip_path = sys.modules["pip"].__spec__.origin
>>> pip_path
'/usr/lib/python3.7/site-packages/pip/__init__.py'
>>> cache_from_source(pip_path)
'/usr/lib/python3.7/site-packages/pip/__pycache__/__init__.cpython-37.pyc'
>>> source_from_cache(cache_from_source(pip_path))
'/usr/lib/python3.7/site-packages/pip/__init__.py'
I think the actual error is in my initial notes though: I forgot that the source file loader sets origin
as the source location (even if the module itself gets initialised from the cached bytecode file), so you should be able to just use __spec__.origin
directly and not need to worry about converting it at all.
(There's a separate optional __spec__.cached
attribute to let loaders report the case where they've loaded from the cache directory instead of directly from the origin file)
For "reading the RECORD
file", I'm referring to the installation database in PEP 376, where pip
(and other compliant installers) make a full record of the files installed for each package: https://www.python.org/dev/peps/pep-0376/#record
And unlike sys.modules
, that installation database uses distribution package names (the ones that get passed to pip install
on the command line) rather than the import package names, so as a first attempt, the CLI wrapper can use the positional arguments to get the list of names that it needs to query. (Ideally it would also check all the dependencies that get upgraded, but that will be a bit tricky when implementing this as a CLI wrapper, so I think it would make sense to defer checking dependency reloading until after pip is providing this functionality natively)
You don't need to read the RECORD
file directly, though: pip
already includes a vendored copy of distlib
(as pip._vendor.distlib
), so for pip_inside
, you can just depend on distlib
and let it do the heavy lifting via https://distlib.readthedocs.io/en/latest/reference.html#InstalledDistribution.list_installed_files:
>>> requested_dist_packages = ["pip", "wheel"] # just an example
>>> import distlib, os.path
>>> dist_path = distlib.database.DistributionPath()
>>> dist_info = [dist_path.get_distribution(dist_name) for dist_name in requested_dist_packages]
>>> file_paths = set(os.path.join(os.path.dirname(project.path), path) for project in dist_info for path, __, __ in project.list_installed_files())
>>> import sys, pip
>>> sys.modules["pip"].__spec__.origin in file_paths
True
https://github.com/reynoldsnlp/pip_inside/tree/%233-origin
@ncoghlan dist_path
is behaving strangely for me. Works for some and doesn't work for others (It does work for both pip
and wheel
)
>>> from pip._vendor.distlib.database import DistributionPath
>>> dist_path = DistributionPath()
>>> for d in Path('/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages').glob('*'):
... print(dist_path.get_distribution(d.stem), d.stem, sep='\t')
...
None appdirs-1.4.3
None olefile
None decorator-4.3.0-py3.6
boto 2.48.0 boto
None mesa
mistune 0.8.3 mistune
None Pillow-4.3.0
Flask 0.12.2 flask
selenium 3.7.0 selenium
None python_dateutil-2.6.1
None pysptk-0.1.11
None simplegeneric
tinysegmenter 0.3 tinysegmenter
appnope 0.1.0 appnope
packaging 16.8 packaging
None simplegeneric-0.8.1-py3.6.egg-info
None testpath-0.3.1
None ply
None tornado-5.1.1
None thinc-6.10.2
thinc 6.10.2 thinc
...
One big item I noticed after some local experimentation is that distlib.DistributionPath
needs to have include_egg=True
passed in for it to correctly pick up projects that are still emitting the old pre-PEP-376 egg-info
format - by default it will only pick up the newer dist-info format defined in that PEP.
However, there are also a couple of challenges that apply specifically to your current attempted behaviour check:
*.py
import modules in addition to *.dist-info
and *.egg-info
metadata directories and files.
is a legal character in dist-info
directory names, so a simple stem
call won't reliably extract the distribution package name, you need to instead look for the first -
character and use everything before thatApplying those three changes gives:
>>> import pathlib
>>> import distlib.database
>>> dist_path = distlib.database.DistributionPath(include_egg=True)
>>> for distpkg in sorted(pathlib.Path("/usr/lib/python3.7/site-packages/").glob("*.*-info")):
... distname = distpkg.name.partition('-')[0]
... print(f"{distname:20}{dist_path.get_distribution(distname)}")
...
IPy IPy 0.81
PyGObject PyGObject 3.30.4
PySocks PySocks 1.6.8
SSSDConfig SSSDConfig 2.0.0
asn1crypto asn1crypto 0.24.0
blivet blivet 3.1.2
blivet_gui None
chardet chardet 3.0.4
cupshelpers cupshelpers 1.0
decorator decorator 4.3.0
distro distro 1.3.0
docutils docutils 0.14
fros fros 1.1
humanize humanize 0.5.1
idna idna 2.7
iniparse iniparse 0.4
initial_setup None
isc isc 2.0
langtable langtable 0.0.40
ntplib ntplib 0.3.3
olefile olefile 0.46
ordered_set None
pid pid 2.1.1
pip pip 18.1
pluggy pluggy 0.6.0
ply ply 3.9
productmd productmd 1.18
py py 1.5.4
pyOpenSSL pyOpenSSL 18.0.0
pycparser pycparser 2.14
pydbus pydbus 0.6.0
pyinotify pyinotify 0.9.6
pykickstart pykickstart 3.16
pyparsing pyparsing 2.2.0
pystray pystray 0.14.3
python_augeas None
python_meh None
python_xlib None
pytz pytz 2018.5
pyudev pyudev 0.21.0
requests requests 2.20.0
requests_file None
requests_ftp None
sepolicy sepolicy 1.1
setuptools setuptools 40.6.3
sh sh 1.12.14
simpleline simpleline 1.4
six six 1.11.0
slip slip 0.6.4
slip.dbus slip.dbus 0.6.4
sos sos 3.6
toml toml 0.10.0
tox tox 3.4.0
urllib3 urllib3 1.24.1
virtualenv virtualenv 16.0.0
That initial list still had a few None
responses in it, and it turned out those where due to the name on disk having _
where the original name has -
(to abide by the naming restriction imposed by the fact that metadata directory naming scheme uses -
as a field separator).
Reversing that transformation allows distlib
to find all the entries in that directory on my system:
>>> import pathlib
>>> import distlib.database
>>> dist_path = distlib.database.DistributionPath(include_egg=True)
>>> for distpkg in sorted(pathlib.Path("/usr/lib/python3.7/site-packages/").glob("*.*-info")):
... distname = distpkg.name.partition('-')[0].replace('_', '-')
... print(f"{distname:20}{dist_path.get_distribution(distname)}")
...
IPy IPy 0.81
PyGObject PyGObject 3.30.4
PySocks PySocks 1.6.8
SSSDConfig SSSDConfig 2.0.0
asn1crypto asn1crypto 0.24.0
blivet blivet 3.1.2
blivet-gui blivet-gui 2.1.10
chardet chardet 3.0.4
cupshelpers cupshelpers 1.0
decorator decorator 4.3.0
distro distro 1.3.0
docutils docutils 0.14
fros fros 1.1
humanize humanize 0.5.1
idna idna 2.7
iniparse iniparse 0.4
initial-setup initial-setup 0.3.62
isc isc 2.0
langtable langtable 0.0.40
ntplib ntplib 0.3.3
olefile olefile 0.46
ordered-set ordered-set 2.0.2
pid pid 2.1.1
pip pip 18.1
pluggy pluggy 0.6.0
ply ply 3.9
productmd productmd 1.18
py py 1.5.4
pyOpenSSL pyOpenSSL 18.0.0
pycparser pycparser 2.14
pydbus pydbus 0.6.0
pyinotify pyinotify 0.9.6
pykickstart pykickstart 3.16
pyparsing pyparsing 2.2.0
pystray pystray 0.14.3
python-augeas python-augeas 0.5.0
python-meh python-meh 0.47
python-xlib python-xlib 0.23
pytz pytz 2018.5
pyudev pyudev 0.21.0
requests requests 2.20.0
requests-file requests-file 1.4.3
requests-ftp requests-ftp 0.3.1
sepolicy sepolicy 1.1
setuptools setuptools 40.6.3
sh sh 1.12.14
simpleline simpleline 1.4
six six 1.11.0
slip slip 0.6.4
slip.dbus slip.dbus 0.6.4
sos sos 3.6
toml toml 0.10.0
tox tox 3.4.0
urllib3 urllib3 1.24.1
virtualenv virtualenv 16.0.0
@ncoghlan, using include_egg=True
breaks DistributionPath
for me. Could it be connected to the fact that I'm using pip
's bundled version of distlib
?
>>> from pip._vendor.distlib.database import DistributionPath
>>> dp = DistributionPath()
>>> dp.get_distribution('pip')
<InstalledDistribution 'pip' 9.0.3 at '/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip-9.0.3.dist-info'>
... however ...
>>> dp = DistributionPath(include_egg=True)
>>> dp.get_distribution('pip')
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 727, in __init__
self._data = json.loads(data)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/database.py", line 237, in get_distribution
self._generate_cache()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/database.py", line 164, in _generate_cache
for dist in self._yield_distributions():
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/database.py", line 154, in _yield_distributions
yield old_dist_class(r.path, self)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/database.py", line 875, in __init__
metadata = self._get_metadata(path)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/database.py", line 955, in _get_metadata
metadata = Metadata(path=path, scheme='legacy')
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 737, in __init__
scheme=scheme)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 281, in __init__
self.read_file(fileobj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 381, in read_file
self.set_metadata_version()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 287, in set_metadata_version
self._fields['Metadata-Version'] = _best_version(self._fields)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py", line 165, in _best_version
raise MetadataConflictError('Unknown metadata set')
pip._vendor.distlib.metadata.MetadataConflictError: Unknown metadata set
I just tried installing distlib
separately, and it is giving me a different error, regardless of whether I pass it the include_egg=True
argument.
>>> from distlib.database import DistributionPath
>>> dp = DistributionPath()
>>> dp.get_distribution('pip')
Traceback (most recent call last):
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/metadata.py", line 730, in __init__
self._data = json.loads(data)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/database.py", line 240, in get_distribution
self._generate_cache()
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/database.py", line 167, in _generate_cache
for dist in self._yield_distributions():
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/database.py", line 148, in _yield_distributions
metadata = Metadata(fileobj=stream, scheme='legacy')
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/metadata.py", line 740, in __init__
scheme=scheme)
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/metadata.py", line 283, in __init__
self.read_file(fileobj)
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/metadata.py", line 381, in read_file
self.set(field, value)
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/metadata.py", line 476, in set
if not scheme.is_valid_constraint_list(value):
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/version.py", line 713, in is_valid_constraint_list
return self.is_valid_matcher('dummy_name (%s)' % s)
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/version.py", line 703, in is_valid_matcher
self.matcher(s)
File "/Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/version.py", line 116, in __init__
'%r constraints' % op)
ValueError: '.*' not allowed for '>=' constraints
I don't what might cause that.
@vsajip, @reynoldsnlp is attempting to use distlib
to work out what modules are installed and read their RECORD
files so we can use that to print a "this module is already loaded, you may need to restart Python" warning, but he's having trouble getting it to give consistent answers on Mac OS X.
Are the above issues a sign that there's a package with bad metadata on the system, or something else?
a sign that there's a package with bad metadata on the system
Possibly - it seems to indicate a file that's invalid at the JSON level rather than the level of any schema. It may be that it's trying to interpret a non-JSON file as JSON. Perhaps if @reynoldsnlp could find what the file actually being read contains, we might be able to make progress.
It might also be worth trying the repo version of distlib
rather than the last released version (0.2.8). I assume that's the standalone version used above.
@reynoldsnlp It may be worth trying to catch the exception with a debugger to see exactly which file distlib
is looking at when it fails. For example, with pdb.pm:
try:
dp.get_distribution('pip')
except:
import pdb; pdb.pm()
I don't have much time to work on this this morning, but I thought I would at least post output of the debugger. It doesn't give any more information than the Tracebacks above.
using the bundled distlib
with include_egg
:
>>> from pip._vendor.distlib.database import DistributionPath
>>> dp = DistributionPath(include_egg=True)
>>> try:
... dp.get_distribution('pip')
... except:
... import pdb; pdb.pm()
...
> /opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip/_vendor/distlib/metadata.py(165)_best_version()
-> raise MetadataConflictError('Unknown metadata set')
>>>
...and using the standalone distlib
:
>>> from distlib.database import DistributionPath
>>> dp = DistributionPath()
>>> try:
... dp.get_distribution('pip')
... except:
... import pdb; pdb.pm()
...
> /Users/myuser/Library/Python/3.6/lib/python/site-packages/distlib/version.py(116)__init__()
-> '%r constraints' % op)
What you're really looking to find out is the state of the local variables in the _yield_distributions
frame at the point the exception gets thrown: https://bitbucket.org/pypa/distlib/src/811516c666ebb80dc5dda95455790d1b9877cfc7/distlib/database.py#lines-148
In particular, metadata_path
should point you directly at the metadata entry that's confusing distlib
.
As an alternative to finding that in a debugger, another option would be to just edit that file to guard against metadata parsing failures by logging an exception and continuing on to the next item rather than letting the exception escape unhandled:
try:
with contextlib.closing(pydist.as_stream()) as stream:
metadata = Metadata(fileobj=stream, scheme='legacy')
except Exception:
logger.exception("Failed to parse metadata at %r, ignoring distribution", metadata_path)
continue
from @ncoghlan here