python / importlib_metadata

Library to access metadata for Python packages
https://importlib-metadata.readthedocs.io
Apache License 2.0
122 stars 79 forks source link

Uncaught Exception when reading Egg information #455

Closed HonakerM closed 1 month ago

HonakerM commented 1 year ago

Hello,

I manage a project using poetry and since importlib-metadata 6.5.1 my install has been failing with the following error. I'm trying to install an updated version of ansible while a current one is already installed. I think the issue is that pathlib is having trouble finding a relative path between the ansible binary /usr/local/bin/ansible and site-packages.

ansible [core 2.13.9]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.8.16 (default, Feb 11 2023, 02:53:15) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True
---
00:40:59  #12 42.71   ValueError
00:40:59  #12 42.71 
00:40:59  #12 42.71   '/usr/local/bin/ansible' does not start with '/usr/local/lib/python3.8/site-packages'
00:40:59  #12 42.71 
00:40:59  #12 42.71   at /usr/lib64/python3.8/pathlib.py:908 in relative_to
00:40:59  #12 42.83        904│         n = len(to_abs_parts)
00:40:59  #12 42.83        905│         cf = self._flavour.casefold_parts
00:40:59  #12 42.83        906│         if (root or drv) if n == 0 else cf(abs_parts[:n]) != cf(to_abs_parts):
00:40:59  #12 42.83        907│             formatted = self._format_parsed_parts(to_drv, to_root, to_parts)
00:40:59  #12 42.83     →  908│             raise ValueError("{!r} does not start with {!r}"
00:40:59  #12 42.83        909│                              .format(str(self), str(formatted)))
00:40:59  #12 42.83        910│         return self._from_parsed_parts('', root if n == 1 else '',
00:40:59  #12 42.83        911│                                        abs_parts[n:])
00:40:59  #12 42.83        912│ 

This error used to be caught by a way to broad broad contextlib.suppress(Exception) suppression which @jaraco's recent PR removed. I'm not that familiar with pythons package/importing mechanisms but could we add back a specific suppress(ValueError) to _read_files_egginfo_installed? We could also completely avoid the error by manually loop through and checking the path before finding its relative. I'm happy to open a PR with either change

jaraco commented 1 year ago

could we add back a specific suppress(ValueError) to _read_files_egginfo_installed

Yes, maybe. I'd first want to concoct a test that captures the expectation that's being missed by not trapping it. That is, I want to have a better understanding of the circumstances that lead to the failure, confirm that those circumstances are legitimate and supported, and then replicate them in a test to ensure that the library continues to honor that circumstance.

Can you investigate what it is about the interactions between ansible and poetry that lead to the failure and create a (ideally minimal) reproducer that replicates that state without using poetry or ansible?

dan-blanchard commented 1 year ago

I just encountered a similar instance of this issue, and the root cause is that pathlib.Path.relative_to() is not smart enough to add leading .. to make a path relative to site-packages even when it should. (See https://bugs.python.org/issue23082#msg366433)

If you switch to using os.path.relpath, it will add the leading .. instead of crashing.

I subclassed Distribution locally to work around this and changed Distribution._read_files_egginfo_installed to:

    def _read_files_egginfo_installed(self):
        """
        Read installed-files.txt and return lines in a similar
        CSV-parsable format as RECORD: each file should be placed
        relative to the site-packages directory and must be
        quoted (since file names can contain literal commas).

        This file is written when the package is installed by pip,
        but it might not be written for other installation methods.
        Assume the file is accurate if it exists.
        """
        text = self.read_text("installed-files.txt")
        # Prepend the .egg-info/ subdir to the lines in this file.
        # But this subdir is only available from PathDistribution's
        # self._path.
        subdir = getattr(self, "_path", None)
        if not text or not subdir:
            return

        site_pkgs_path = self.locate_file("").resolve()
        for name in text.splitlines():
            path = (subdir / name).resolve()
            try:
                path = path.relative_to(site_pkgs_path)
            except ValueError:
                # relpath will add .. to a path to make it relative to site-packages
                path = Path(os.path.relpath(path, site_pkgs_path))
            yield f'"{path.as_posix()}"'

I may change it to just always use os.path.relpath instead of handling the exception. Happy to make a PR if this seems reasonable.

jaraco commented 7 months ago

Thanks dan-blanchard for the additional explanation. It does sound as if there may be a bug here. What I still don't understand is what are the conditions under which this failure occurs? Is there a way to replicate the conditions that trigger the failure? I'd like to avoid rewriting that function in a way that introduces extra branching and looping and mutated variables. If there is a deficiency in Path.relative_to, I'd like to see that deficiency documented (in a bug if a bug else in the docs) and a wrapper function replacing the .relative_to() call that encapsulates the workaround... something like:

diff --git a/importlib_metadata/__init__.py b/importlib_metadata/__init__.py
index 312d6966..0cfd1f47 100644
--- a/importlib_metadata/__init__.py
+++ b/importlib_metadata/__init__.py
@@ -524,14 +524,23 @@ class Distribution(DeprecatedNonAbstract):
             return

         paths = (
-            (subdir / name)
-            .resolve()
-            .relative_to(self.locate_file('').resolve())
-            .as_posix()
+            self._relative_to(
+                (subdir / name).resolve(),
+                self.locate_file('').resolve(),
+            ).as_posix()
             for name in text.splitlines()
         )
         return map('"{}"'.format, paths)

+    def _relative_to(self, path, root):
+        """
+        Workaround for ... where ".." isn't added.
+        """
+        try:
+            return path.relative_to(root)
+        except ValueError:
+            return pathlib.Path(os.path.relpath(path, root))
+
     def _read_files_egginfo_sources(self):
         """
         Read SOURCES.txt and return lines in a similar CSV-parsable

Such an approach would be minimally invasive to the current behavior and provide context for a future reader to understand why that method exists (and possibly when it can be removed).

But more important is to find an example case that fails and write a test from it that captures the missed expectation.

If you'd like to start on a PR with that test, that would be fantastic. Or if you're only able to devise a set of steps to recreate the failure, that would be useful for someone else to create such a test.

dan-blanchard commented 6 months ago

I just put up a PR that I think should fix (and test) this.

P403n1x87 commented 6 months ago

Based on a report that we had, and that is now linked in this issue, I don't think a missing .. is all there is to it. From the following traceback

Traceback (most recent call last):
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/ddtrace/internal/packages.py", line 101, in _package_file_mapping
    if ilmd_d is not None and ilmd_d.files is not None:
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 520, in files
    return skip_missing_files(
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/_functools.py", line 102, in wrapper
    return func(param, *args, **kwargs)
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 518, in skip_missing_files
    return list(filter(lambda path: path.locate().exists(), package_paths))
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 555, in <genexpr>
    (subdir / name)
  File "/opt/homebrew/bin/Cellar/pyenv/2.3.33/versions/3.9.7/lib/python3.9/pathlib.py", line 939, in relative_to
    raise ValueError("{!r} is not in the subpath of {!r}"
ValueError: '/Users/econnolly/source/repos/rivt/personalization/.venv/bin/futurize' is not in the subpath of '/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages' OR one path is relative and the other is absolute.

we can see a path in the bin area of a venv, and the other in the lib area. My hunch is that the distribution is shipping entry points as part of its file contents, which would be located under the bin area. Trying to match that under the site-customize area will fail.

dan-blanchard commented 6 months ago

Based on a report that we had, and that is now linked in this issue, I don't think a missing .. is all there is to it. From the following traceback

I think we're both saying the same thing in different ways. Some packages include absolute paths in installed-files.txt that are outside of site-packages. When I was looking into the issue, this seems to happen pretty commonly with jupyterlab extension packages. pathlib.Path.relative_to() does not support trying to figure out how two absolute paths are relative to each other (unless one is a subpath of the other), whereas os.path.relpath does, so using that instead seems to solve the issue.

For one of Coiled's projects where we kept running into this issue, I already worked around it by subclassing Distribution as mentioned in https://github.com/python/importlib_metadata/issues/455#issuecomment-1611905214, and things generally worked as expected. #482 should fix it as well.