pypy / pypy

PyPy is a very fast and compliant implementation of the Python language.
https://pypy.org
Other
828 stars 42 forks source link

pypy3: macos: threading is preimported early preventing gpython and gevent to work correctly #3269

Closed gitlab-importer closed 6 months ago

gitlab-importer commented 3 years ago

In Heptapod by @kirr on Jul 15, 2020, 13:54

Hello up there. In the process of fixing virtualenv support for gpython and gevent it was discovered that pypy3 (and only pypy3, not pypy2 nor cpython) on macos (and only on macos, not on linux/windows) imports threading module very early - before any user code is run.

Gevent-based applications need to be able to import threading, socket, ssl, etc... modules very early to be able to monkey-patch them to use lightweight greenlets instead of OS threads. This patching has to be done as early as possible in the lifetime of the process

http://www.gevent.org/intro.html#monkey-patching

However pypy3/darwin preimports threading during startup via the following chain of imports:

site -> sysconfig -> _sysconfigdata -> platform -> subprocess -> threading.

(details)

Could you please fix this?

Without the fix gpython fails to start with error like below:

(py38.virtualenv) kirr@deco:~/tmp/trashme$ gpython 
Traceback (most recent call last):
  File "/home/kirr/tmp/trashme/py38.virtualenv/bin/gpython", line 8, in <module>
    sys.exit(main())
  File "/home/kirr/tmp/trashme/py38.virtualenv/lib/python3.8/site-packages/gpython/__init__.py", line 151, in main
    raise RuntimeError('gpython: internal error: the following modules are pre-imported, but must be not:'
RuntimeError: gpython: internal error: the following modules are pre-imported, but must be not:

        ['threading']

sys.modules:

        ['__future__', '__main__', '_abc', '_bootlocale', '_codecs', '_collections', '_collections_abc', '_frozen_importlib', '_frozen_importlib_external', '_functools', '_heapq', '_imp', '_io', '_locale', '_operator', '_signal', '_sitebuiltins', '_sre', '_stat', '_thread', '_virtualenv', '_warnings', '_weakref', '_weakrefset', 'abc', 'builtins', 'codecs', 'collections', 'contextlib', 'copyreg', 'encodings', 'encodings.aliases', 'encodings.latin_1', 'encodings.utf_8', 'enum', 'functools', 'genericpath', 'gpython', 'heapq', 'importlib', 'importlib._bootstrap', 'importlib._bootstrap_external', 'importlib.abc', 'importlib.machinery', 'importlib.util', 'io', 'itertools', 'keyword', 'marshal', 'operator', 'os', 'os.path', 'posix', 'posixpath', 're', 'reprlib', 'site', 'sitecustomize', 'sre_compile', 'sre_constants', 'sre_parse', 'stat', 'sys', 'threading', 'time', 'types', 'warnings', 'zipimport', 'zope']

Please see https://github.com/pypa/virtualenv/issues/1895 and https://github.com/pypa/virtualenv/pull/1897 to get more context about this issue.

I see several potential ways for the fix:

  1. import subprocess in platform only as needed:
--- a/lib-python/3/platform.py
+++ b/lib-python/3/platform.py
@@ -113,7 +113,7 @@
 __version__ = '1.0.8'

 import collections
-import sys, os, re, subprocess
+import sys, os, re

 import warnings

@@ -804,6 +804,8 @@ def _syscmd_file(target, default=''):
         default in case the command should fail.

     """
+    import subprocess
+
     if sys.platform in ('dos', 'win32', 'win16'):
         # XXX Others too ?
         return default

This is already done this way on pypy2 https://foss.heptapod.net/pypy/pypy/-/blob/13582da4d60b6ce78dffd0916c11d2f3aa8cc880/lib-python/2.7/platform.py#L984-1009

However if I understand correctly, lib-python is copied as-is from upstream python, so it could be undesirable to modify it.

  1. patch _sysconfigdata.py to avoid importing platform:
--- a/lib_pypy/_sysconfigdata.py
+++ b/lib_pypy/_sysconfigdata.py
@@ -34,15 +34,18 @@
         build_time_vars["CXX"] = "g++ -pthread"

 if sys.platform[:6] == "darwin":
-    import platform
-    if platform.machine() == 'i386':
+    # don't import platform - it imports subprocess -> threading
+    # gevent-based projects need to be first to import threading and
+    # monkey-patch as early as possible in the lifetime of their process.
+    _, _, _, _, machine = os.uname()
+    if machine == 'i386':
         if platform.architecture()[0] == '32bit':
             arch = 'i386'
         else:
             arch = 'x86_64'
     else:
         # just a guess
-        arch = platform.machine()
+        arch = machine
     build_time_vars['LDSHARED'] += ' -undefined dynamic_lookup'
     build_time_vars['CC'] += ' -arch %s' % (arch,)
     if "CXX" in build_time_vars:

I had not tested the patches as I don't have mac and my mac VM somehow stopped working.

Thanks beforehand,
Kirill

gitlab-importer commented 3 years ago

In Heptapod by @mattip on Jul 15, 2020, 14:46

Unfortunately our macOS buildbot is offline. If someone with a mac could verify that solution 2 works, that seems like the more desireable one. I think it would be more performant as well: importing platform is quite slow.

gitlab-importer commented 3 years ago

In Heptapod by @jamadden on Jul 15, 2020, 15:05

To clarify, importing threading before monkey-patching is not a problem for gevent itself, gevent deals gracefully with that and even with native threading.Lock objects — importing logging, for example, imports threading and creates locks, and gevent is tested specifically to deal with that. Creating native threads might cause issues, though, depending. gpython is raising its error out of an abundance of caution, I suspect.

gitlab-importer commented 3 years ago

In Heptapod by @jamadden on Jul 15, 2020, 15:10

macOS hasn't supported 32-bit architectures since macOS Catalina (2019), and 32-bit support was deprecated in 2018. Every release of macOS from macOS Lion in 2011 has required 64-bit hardware and has run in 64-bit mode. Is it really necessary to support i386 at this point, or could pypy just assume that darwin means 64-bit?

gitlab-importer commented 3 years ago

In Heptapod by @kirr on Jul 15, 2020, 18:43

@mattip, @jamadden, thanks for feedback. Yes, I'm following official gevent advice to monkey-patch as early as possible. patch_thread(existing_locks=True) is documented to be best-effort only, and that "It is important to monkey-patch extremely early in the startup process", isn't it?

http://www.gevent.org/api/gevent.monkey.html#gevent.monkey.patch_thread

That's why to work 100% reliably gpython takes the cautious side and asserts on startup that modules that will be monkey-patched are not preimported at all.

Anyway, I've fixed my macos VM (it was Debian bug 964247) and here is updated patch for solution "2" that is verified to work (I did not build pypy from scratch - only modified .py files from pypy3.6-v7.3.1-osx64.tar.bz2 download):

--- lib_pypy/_sysconfigdata.py.kirr 2020-04-07 05:59:04.000000000 +0300
+++ lib_pypy/_sysconfigdata.py  2020-07-15 20:34:43.000000000 +0300
@@ -39,15 +39,21 @@
         build_time_vars["CXX"] = "g++ -pthread"

 if sys.platform[:6] == "darwin":
-    import platform
-    if platform.machine() == 'i386':
-        if platform.architecture()[0] == '32bit':
+    # don't import platform - it imports subprocess -> threading.
+    # gevent-based projects need to be first to import threading and
+    # monkey-patch as early as possible in the lifetime of their process.
+    # https://foss.heptapod.net/pypy/pypy/-/issues/3269
+    _, _, _, _, machine = os.uname()
+    import struct
+    wordsize = struct.calcsize('P') # void* : 4 on 32bit, 8 on 64bit
+    if machine == 'i386':
+        if wordsize == 4:
             arch = 'i386'
         else:
             arch = 'x86_64'
     else:
         # just a guess
-        arch = platform.machine()
+        arch = machine
     build_time_vars['LDSHARED'] += ' -undefined dynamic_lookup'
     build_time_vars['CC'] += ' -arch %s' % (arch,)
     if "CXX" in build_time_vars:

Could you please apply it? If yes, I would appreciate if a test is added that upon interpreter startup "threading" is not in sys.modules. To me it would be also a good idea to add such test to 2.7 branch.

Thanks beforehand,
Kirill

gitlab-importer commented 3 years ago

In Heptapod by @mattip on Jul 16, 2020, 10:46

Applied in b150be387687 together with a test. It will be in tomorrow's nightly build, but unfortunately our macOS build slave is still off-line. Thanks for tracking this down.

gitlab-importer commented 3 years ago

In Heptapod by @mattip on Jul 16, 2020, 10:46

closed

gitlab-importer commented 3 years ago

In Heptapod by @kirr on Jul 16, 2020, 11:17

Thanks @mattip. Could you please amend the test to verify that threading - not only platform is not preimported?

diff --git a/extra_tests/test_startup.py b/extra_tests/test_startup.py
index bc5295a9a7e..30279ecfbf6 100644
--- a/extra_tests/test_startup.py
+++ b/extra_tests/test_startup.py
@@ -1,8 +1,9 @@

-def test_platform_not_imported():
+def test_platform_and_threading_not_imported():
     import subprocess
     import sys
     out = subprocess.check_output([sys.executable, '-c',
          'import sys; print(list(sys.modules.keys()))'], universal_newlines=True)
     modules = [x.strip(' "\'') for x in out.strip().strip('[]').split(',')]
     assert 'platform' not in modules
+    assert 'threading' not in modules

It would be also good to have this test at default tip (2.7) branch, not to occassionally regress there too.

gitlab-importer commented 3 years ago

In Heptapod by @kirr on Jul 16, 2020, 11:31

BTW, I'm using https://github.com/kholia/OSX-KVM to run macOS on linux via qemu. This way a macOS buildbot could be setup on any linux host.

gitlab-importer commented 3 years ago

In Heptapod by @mattip on Jul 16, 2020, 13:05

Extended the test and backported to default.

I tried the OSX-KVM solution a while back but it was not stable on my machine: it kept hanging.

gitlab-importer commented 3 years ago

In Heptapod by @kirr on Jul 16, 2020, 13:20

@mattip, I see. Thanks for including threading and 2.7 into test coverage.