python / cpython

The Python programming language
https://www.python.org
Other
62.14k stars 29.86k forks source link

With #116680, `test_importlib` started leaking on refleaks #116731

Closed jaraco closed 5 months ago

jaraco commented 5 months ago

Ironically, since this was merged, test_importlib started leaking on refleaks buildbots.

Originally posted by @encukou in https://github.com/python/cpython/issues/116680#issuecomment-1994115142

Linked PRs

jaraco commented 5 months ago

Example of leak:

$ ./python -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1080788092
0:00:00 load avg: 1.15 Run 1 test sequentially
0:00:00 load avg: 1.15 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XXX XXX
test_importlib leaked [116, 116, 116] references, sum=348
test_importlib leaked [78, 78, 78] memory blocks, sum=234
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 2.0 sec
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE

Originally posted by @vstinner in https://github.com/python/cpython/issues/116680#issuecomment-1994152426

jaraco commented 5 months ago

That's interesting - the ref leak failure is in a importlib.metadata test, whereas the PR only made changes to the importlib.resources tests. My guess right now is that the importlib.resources module leak was masking the importlib.metadata ref leaks. If that's the case, it should be possible to replicate the failure prior to the PR merge.

jaraco commented 5 months ago

Indeed, the ref leak existed prior to the PR:

 cpython main @ git switch --detach a2548077614f81f25a2c3465dabb7a0a3885c40c~1
HEAD is now at bb66600558 CI: Process stale issues twice per day (#116636)
 cpython bb66600558 @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1648965432
0:00:00 load avg: 2.87 Run 1 test sequentially
0:00:00 load avg: 2.87 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XXX XXX
test_importlib leaked [116, 116, 116] references, sum=348
test_importlib leaked [78, 78, 78] memory blocks, sum=234
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 449 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE
jaraco commented 5 months ago

Disabling the logic under test, the leaks go away, so it appears the leak is in the call to requires().

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..8e388b89cd 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -219,7 +219,7 @@ def test_requires_egg_info_empty(self):
             },
             self.site_dir.joinpath('egginfo_pkg.egg-info'),
         )
-        deps = requires('egginfo-pkg')
+        deps = []
         assert deps == []

     def test_requires_dist_info(self):
 cpython main @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 3436765869
0:00:00 load avg: 1.12 Run 1 test sequentially
0:00:00 load avg: 1.12 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. ...

== Tests result: SUCCESS ==

1 test OK.

Total duration: 439 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered)
Result: SUCCESS
jaraco commented 5 months ago

The leaks don't emerge when calling Distribution.discover() but do emerge when assigning the result to a variable:

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..643f17726e 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -219,8 +219,8 @@ def test_requires_egg_info_empty(self):
             },
             self.site_dir.joinpath('egginfo_pkg.egg-info'),
         )
-        deps = requires('egginfo-pkg')
-        assert deps == []
+        Distribution.discover(name='egginfo-pkg')
+        dist, = Distribution.discover(name='egginfo-pkg')

     def test_requires_dist_info(self):
         deps = requires('distinfo-pkg')

So it's the evaluating of the generator returned by Distribution.discover that's triggering the leaks.

jaraco commented 5 months ago

Digging deeper, I can trigger the leak with:

        from importlib.metadata import DistributionFinder
        context = DistributionFinder.Context(name='egginfo-pkg')
        path_resolver, = Distribution._discover_resolvers()
        tuple(path_resolver(context))
jaraco commented 5 months ago

Just to confirm, this bug exists at least as far back as Python 3.11, and likely has been around for even longer.

 cpython 3.11 @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1497279497
0:00:00 load avg: 9.25 Run 1 test sequentially
0:00:00 load avg: 9.25 [1/1] test_importlib
beginning 6 repetitions
123456
......
test_importlib leaked [150, 150, 150] references, sum=450
test_importlib leaked [56, 56, 56] memory blocks, sum=168
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 609 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE
jaraco commented 5 months ago

The issue goes away when removing the lru_cache on FastPath:

diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..7d3b8a7733 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -665,7 +665,6 @@ class FastPath:
     ['...']
     """

-    @functools.lru_cache()  # type: ignore
     def __new__(cls, root):
         return super().__new__(cls)
jaraco commented 5 months ago

It seems that because each iteration of the test creates a new temporary site dir and because those are cached in FastPath(), that's creating the reference leaks.

 cpython main @ git diff
diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..13c42e6572 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -665,8 +665,8 @@ class FastPath:
     ['...']
     """

-    @functools.lru_cache()  # type: ignore
     def __new__(cls, root):
+        print(cls, root)
         return super().__new__(cls)

     def __init__(self, root):
 cpython main @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 595199935
0:00:00 load avg: 1.56 Run 1 test sequentially
0:00:00 load avg: 1.56 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmpouk96igu
X<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmp_mkqfkjq
X<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmpfybimqfz
.<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmps1cnufs4
 .<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmp2qgor2va
.<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmplqwy62rr
.

== Tests result: SUCCESS ==

1 test OK.

Total duration: 458 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered)
Result: SUCCESS
jaraco commented 5 months ago

This patch also works around the ref leak, at least for tests in APITests:

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..b30240e291 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -8,6 +8,7 @@
 from . import fixtures
 from importlib.metadata import (
     Distribution,
+    FastPath,
     PackageNotFoundError,
     distribution,
     entry_points,
@@ -37,6 +38,9 @@ class APITests(
 ):
     version_pattern = r'\d+\.\d+(\.\d)?'

+    def tearDown(self):
+        FastPath.__new__.cache_clear()
+
     def test_retrieves_version_of_self(self):
         pkg_version = version('egginfo-pkg')
         assert isinstance(pkg_version, str)
jaraco commented 5 months ago

These ref leaks are by design - this cache is there to improve performance.

I'm not quite sure how to handle this situation robustly. I know there's the importlib.invalidate_caches and it's even used and tested in the MetaPathFinder... but that MetadataPathFinder is only installed to the sys.meta_path on the importlib_metadata backport. In CPython, it's called explicitly, so no invalidate_caches method will be discovered.

It does look like the following patch also addresses the issue:

 cpython main @ git diff
diff --git a/Lib/importlib/_bootstrap_external.py b/Lib/importlib/_bootstrap_external.py
index 46ddceed07..b26be8583d 100644
--- a/Lib/importlib/_bootstrap_external.py
+++ b/Lib/importlib/_bootstrap_external.py
@@ -1470,6 +1470,9 @@ def invalidate_caches():
         # https://bugs.python.org/issue45703
         _NamespacePath._epoch += 1

+        from importlib.metadata import MetadataPathFinder
+        MetadataPathFinder.invalidate_caches()
+
     @staticmethod
     def _path_hooks(path):
         """Search sys.path_hooks for a finder for 'path'."""
diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..41c2a4a608 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -797,6 +797,7 @@ def _search_paths(cls, name, paths):
             path.search(prepared) for path in map(FastPath, paths)
         )

+    @classmethod
     def invalidate_caches(cls) -> None:
         FastPath.__new__.cache_clear()

diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..e2936303ac 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -37,6 +37,9 @@ class APITests(
 ):
     version_pattern = r'\d+\.\d+(\.\d)?'

+    def tearDown(self):
+        importlib.invalidate_caches()
+
     def test_retrieves_version_of_self(self):
         pkg_version = version('egginfo-pkg')
         assert isinstance(pkg_version, str)

That patch feels like it's closer to the right thing to do. It first exposes the cache invalidation in the PathFinder, so the caches will be invalidated as a matter of course. That seems worthwhile no matter what.

I'm less confident about using the tearDown on the APITests, because I'm far from confident that's the right place to have this hook. It may be too broad but may also be too narrow. I guess I'll run the whole refleaks test and see what else is failing and make sure the cache invalidation happens for any implicated test.

jaraco commented 5 months ago

There's another ref leak in test_implicit_files, which can be suppressed by disabling the behavior:

diff --git a/Lib/test/test_importlib/resources/test_files.py b/Lib/test/test_importlib/resources/test_files.py
index 26c8b04e44..90834fcab2 100644
--- a/Lib/test/test_importlib/resources/test_files.py
+++ b/Lib/test/test_importlib/resources/test_files.py
@@ -99,7 +99,7 @@ def test_implicit_files(self):
                 '__init__.py': textwrap.dedent(
                     """
                     import importlib.resources as res
-                    val = res.files().joinpath('res.txt').read_text(encoding='utf-8')
+                    val = 'resources are the best'
                     """
                 ),
                 'res.txt': 'resources are the best',
jaraco commented 5 months ago

Distilling that issue, it looks like it boils down to calling inspect.stack() inside of an imported module:

diff --git a/Lib/test/test_importlib/resources/test_files.py b/Lib/test/test_importlib/resources/test_files.py
index 26c8b04e44..335f0b6718 100644
--- a/Lib/test/test_importlib/resources/test_files.py
+++ b/Lib/test/test_importlib/resources/test_files.py
@@ -95,18 +95,15 @@ def test_implicit_files(self):
         Without any parameter, files() will infer the location as the caller.
         """
         spec = {
-            'somepkg': {
-                '__init__.py': textwrap.dedent(
-                    """
-                    import importlib.resources as res
-                    val = res.files().joinpath('res.txt').read_text(encoding='utf-8')
-                    """
-                ),
-                'res.txt': 'resources are the best',
-            },
+            'somepkg.py': textwrap.dedent(
+                """
+                import inspect
+                inspect.stack()
+                """
+            ),
         }
         _path.build(spec, self.site_dir)
-        assert importlib.import_module('somepkg').val == 'resources are the best'
+        importlib.import_module('somepkg')

 if __name__ == '__main__':
jaraco commented 5 months ago

In c6d794f760, I devised a workaround, clearing some global state after running the test. This change and the other allows the refleaks test to pass for test_importlib. None of these refleaks are particularly meaningful. They're pointing out that the test suite consumes unique resources on each run.

encukou commented 5 months ago

Thank you for hunting this down!

There's a place in regrtest for clearing caches. I'd add it there, and be very narrow -- when calling all of invalidate_caches it'd be easy to miss a cache that could grow without bounds, but shouldn't. I filed https://github.com/python/cpython/issues/116804 for the inspect ones. IMO those should be bounded. But that's for later, today let's clear the caches.