python / cpython

The Python programming language
https://www.python.org
Other
63.51k stars 30.42k forks source link

Use vfork() in subprocess on Linux #80004

Closed 9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb closed 4 years ago

9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago
BPO 35823
Nosy @gpshead, @ronaldoussoren, @socketpair, @koobs, @izbyshev, @pablogsal, @Jongy
PRs
  • python/cpython#11671
  • python/cpython#11671
  • python/cpython#11671
  • python/cpython#11668
  • python/cpython#11668
  • python/cpython#22944
  • python/cpython#22945
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/izbyshev' closed_at = created_at = labels = ['extension-modules', 'type-feature', '3.10'] title = 'Use vfork() in subprocess on Linux' updated_at = user = 'https://github.com/izbyshev' ``` bugs.python.org fields: ```python activity = actor = 'socketpair' assignee = 'izbyshev' closed = True closed_date = closer = 'gregory.p.smith' components = ['Extension Modules'] creation = creator = 'izbyshev' dependencies = [] files = [] hgrepos = [] issue_num = 35823 keywords = ['patch', 'patch', 'patch'] message_count = 30.0 messages = ['334336', '334345', '334353', '334363', '334402', '334404', '334407', '334534', '334559', '372426', '372429', '372444', '378664', '378703', '379503', '379504', '379507', '379511', '379527', '379529', '379530', '379531', '379535', '379536', '379537', '379538', '379568', '416901', '416970', '416975'] nosy_count = 7.0 nosy_names = ['gregory.p.smith', 'ronaldoussoren', 'socketpair', 'koobs', 'izbyshev', 'pablogsal', 'Yonatan Goldschmidt'] pr_nums = ['11671', '11671', '11671', '11668', '11668', '22944', '22945'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue35823' versions = ['Python 3.10'] ```

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

    This issue is to propose a (complementary) alternative to the usage of posix_spawn() in subprocess (see bpo-35537).

    As mentioned by Victor Stinner in msg332236, posix_spawn() has the potential of being faster and safer than fork()/exec() approach. However, some of the currently available implementations of posix_spawn() have technical problems (this mostly summarizes discussions in bpo-35537):

    Regardless of implementation, posix_spawn() is also unsuitable for some subprocess use cases:

    May be even more importantly, fundamentally, posix_spawn() will always be less flexible than fork()/exec() approach. Any additions will have to go through POSIX standardization or be unportable. Even if approved, a change will take years to get to actual users because of the requirement to update the C library, which may be more than a decade behind in enterprise Linux distros. This is in contrast to having an addition implemented in CPython. For example, a setrlimit() action for posix_spawn() is currently rejected in POSIX[2], despite being trivial to add.

    I'm interested in avoiding posix_spawn() problems on Linux while still delivering comparable performance and safety. To that end I've studied implementations of posix_spawn() in glibc[3] and musl[4], which use vfork()/execve()-like approach, and investigated challenges of using vfork() safely on Linux (e.g. [5]) -- all of that for the purpose of using vfork()/exec() instead of fork()/exec() or posix_spawn() in subprocess where possible.

    The unique property of vfork() is that the child shares the address space (including heap and stack) as well as thread-local storage with the parent, which means that the child must be very careful not to surprise the parent by changing the shared resources under its feet. The parent is suspended until the child performs execve(), _exit() or dies in any other way.

    The most safe way to use vfork() is if one has access to the C library internals and can do the the following:

    1) Disable thread cancellation before vfork() to ensure that the parent thread is not suddenly cancelled by another thread with pthread_cancel() while being in the middle of child creation.

    2) Block all signals before vfork(). This ensures that no signal handlers are run in the child. But the signal mask is preserved by execve(), so the child must restore the original signal mask. To do that safely, it must reset dispositions of all non-ignored signals to the default, ensuring that no signal handlers are executed in the window between restoring the mask and execve().

    Note that libc-internal signals should be blocked too, in particular, to avoid "setxid problem"[5].

    3) Use a separate stack for the child via clone(CLONE_VM|CLONE_VFORK), which has exactly the same semantics as vfork(), but allows the caller to provide a separate stack. This way potential compiler bugs arising from the fact that vfork() returns twice to the same stack frame are avoided.

    4) Call only async-signal-safe functions in the child.

    In an application, only (1) and (4) can be done easily.

    One can't disable internal libc signals for (2) without using syscall(), which requires knowledge of the kernel ABI for the particular architecture.

    clone(CLONE_VM) can't be used at least before glibc 2.24 because it corrupts the glibc pid/tid cache in the parent process[6,7]. (As may be guessed, this problem was solved by glibc developers when they implemented posix_spawn() via clone()). Even now, the overall message seems to be that clone() is a low-level function not intended to be used by applications.

    Even with the above, I still think that in context of subprocess/CPython the sufficient vfork()-safety requirements are provided by the following.

    Despite being easy, (1) seems to be not necessary: CPython never uses pthread_cancel() internally, so Python code can't do that. A non-Python thread in an embedding app could try, but cancellation, in my knowledge, is not supported by CPython in any case (there is no way for an app to cleanup after the cancelled thread), so subprocess has no reason to care.

    For (2), we don't have to worry about the internal signal used for thread cancellation because of the above. The only other internal signal is used for setxid syncronization[5]. The "setxid problem" is mitigated in Python because the spawning thread holds GIL, so Python code can't call os.setuid() concurrently. Again, a non-Python thread could, but I argue that an application that spawns a child and calls setuid() in non-synchronized manner is not worth supporting: a child will have "random" privileges depending on who wins the race, so this is hardly a good security practice. Even if such apps are considered worthy to support, we may limit vfork()/exec() path only to the non-embedded use case.

    For (3), with production-quality compilers, using vfork() should be OK. Both GCC and Clang recognize it and handle in a special way (similar to setjmp(), which also has "returning twice" semantics). The supporting evidence is that Java has been using vfork() for ages, Go has migrated to vfork(), and, coincidentally, dotnet is doing it right now[8].

    (4) is already done in _posixsubprocess on Linux.

    I've implemented a simple proof-of-concept that uses vfork() in subprocess on Linux by default in all cases except if preexec_fn is not None. It passes all tests on OpenSUSE (Linux 4.15, glibc 2.27) and Ubuntu 14.04 (Linux 4.4, glibc 2.19), but triggers spurious GCC warnings, probably due to a long-standing GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21161

    I've also run a variant of subprocess_bench.py (by Victor Stinner from bpo-35537) with close_fds=False and restore_signals=False removed on OpenSUSE:

    $ env/bin/python -m perf compare_to fork.json vfork.json
    Mean +- std dev: [fork] 154 ms +- 18 ms -> [vfork] 1.23 ms +- 0.04 ms: 125.52x faster (-99%)

    Compared to posix_spawn, the results on the same machine are similar:

    $ env/bin/python -m perf compare_to posix_spawn.json vfork.json
    Mean +- std dev: [posix_spawn] 1.24 ms +- 0.04 ms -> [vfork] 1.22 ms +- 0.05 ms: 1.02x faster (-2%)

    Note that my implementation should work even for QEMU user-mode (and probably WSL) because it doesn't rely on address space sharing.

    Things to do:

    [1] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup&pathrev=326193 [2] http://austingroupbugs.net/view.php?id=603 [3] https://sourceware.org/git/?p=glibc.git;a=history;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD [4] http://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c [5] https://ewontfix.com/7 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=10311 [7] https://sourceware.org/bugzilla/show_bug.cgi?id=18862 [8] https://github.com/dotnet/corefx/pull/33289

    ronaldoussoren commented 5 years ago

    Issue bpo-34663 contains some earlier discussion about vfork, in the context of supporting a specific posix_spawn flag on Linux.

    W.r.t. closing all file descriptors > 2: posix_spawn_file_actions_addclose can do this when using posix_spawn. That would have a performance cost, you'd basically have to resort to closing all possible file descriptors and cannot use the smarter logic used in _posixsubprocess. However, the smarter closing code in _posixsubprocess is not safe w.r.t. vfork according to the comment above _close_open_fds_maybe_unsafe: that function uses some functions that aren't async-safe and one of those calls malloc.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

    W.r.t. closing all file descriptors > 2: posix_spawn_file_actions_addclose can do this when using posix_spawn. That would have a performance cost, you'd basically have to resort to closing all possible file descriptors and cannot use the smarter logic used in _posixsubprocess.

    This is costly to the point of people reporting bugs: bpo-35757. If that really causes 0.1s delay like the reporter said, it would defeat the purpose of using posix_spawn() in the first place.

    However, the smarter closing code in _posixsubprocess is not safe w.r.t. vfork according to the comment above _close_open_fds_maybe_unsafe: that function uses some functions that aren't async-safe and one of those calls malloc.

    No, it's not so on Linux: https://github.com/python/cpython/blob/62c35a8a8ff5854ed470b1c16a7a14f3bb80368c/Modules/_posixsubprocess.c#L314

    Moreover, as I already argued in msg332235, using malloc() after vfork() should be *safer* than after fork() for a simple reason: all memory-based locks will still work as though the child is merely a thread in the parent process. This is true even for things like futex(FUTEX_WAKE_PRIVATE), despite that this operation is mistakenly documented as "process-private" in man pages. It's actually more like "private to tasks sharing the same address space".

    This is in contrast with fork(): if it's called while another thread is holding some mutex in malloc(), nobody would unlock it in the child unless the C library has precautions against that.

    ronaldoussoren commented 5 years ago

    BTW. I hadn't noticed _close_open_fds_safe, that should be safe when using vfork().

    Finally: I have no strong opinion on this patch, your explanation looks fine and I'm not up-to-speed w.r.t. implementation details of vfork and the like to feel comfortable about giving a proper review without spending a lot more time on it.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

    I've checked subprocess.Popen() error reporting in QEMU user-mode and WSL and confirm that it works both with my patch (vfork/exec) and the traditional fork/exec, but doesn't work with glibc's posix_spawn.

    The first command below uses posix_spawn() internally because close_fds=False. Under QEMU posix_spawn() from glibc doesn't report errors because it relies on address-space sharing of clone(CLONE_VM|CLONE_VFORK), which is not emulated by QEMU. The second command uses vfork()/exec() in _posixsubprocess, but lack of address-space sharing doesn't matter because the error data is transferred via a pipe.

    $ qemu-x86_64 --version
    qemu-x86_64 version 2.11.1
    Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
    $ ldd --version
    ldd (GNU libc) 2.27
    Copyright (C) 2018 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Roland McGrath and Ulrich Drepper.
    $ qemu-x86_64 ./python3 -c 'import subprocess; subprocess.call("/xxx", close_fds=False)'
    $ qemu-x86_64 ./python3 -c 'import subprocess; subprocess.call("/xxx")'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 324, in call
        with Popen(*popenargs, **kwargs) as p:
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 830, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 1648, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'xxx'

    For WSL, I've tested Ubuntu 18.04 (glibc 2.27) on Windows 10 1709 (Fall Creators Update) and 1803.

    In 1709, the result is the same as for QEMU (i.e. the first command silently succeeds). In 1803, both commands raise the exception because address-space sharing was fixed for clone(CLONE_VM|CLONE_VFORK) and vfork() in WSL: https://github.com/Microsoft/WSL/issues/1878

    I've also run all subprocess tests under QEMU user-mode (by making sys.executable to point to a wrapper that runs python under QEMU) for the following code bases:

    Out of curiosity I also did the same on WSL in 1803. Only 3 groups of tests fail, all because of WSL bugs:


    \====================================================================== FAIL: test_close_fds_when_max_fd_is_lowered (test.test_subprocess.POSIXProcessTestCase) Confirm that bpo-21618 is fixed (may fail under valgrind). ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 2467, in test_close_fds_when_max_fd_is_lowered
        self.assertFalse(remaining_fds & opened_fds,
    AssertionError: {34, 35, 36, 37, 38, 39, 40, 41, 42} is not false : Some fds were left open.

    \====================================================================== FAIL: test_restore_signals (test.test_subprocess.POSIXProcessTestCase) ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 1643, in test_restore_signals
        self.assertNotEqual(default_sig_ign_mask, restored_sig_ign_mask,
    AssertionError: b'SigIgn:\t0000000000000000' == b'SigIgn:\t0000000000000000' : restore_signals=True should've unblocked SIGPIPE and friends.

    None of the above is related to fork/vfork/posix_spawn -- the set of failing tests is the same in all four flavors.

    gpshead commented 5 years ago

    Thanks for your _extremely detailed_ analysii of the (often sad) state of posix_spawn() on platforms in the world today.

    My first reaction to this was "but then we'll be owning our own custom posix_spawn-like implementation as if we'll do better at it than every individual libc variant."

    After reading this through and looking at your PR... I now consider that a good thing. =) We clearly can do better. vfork() is pretty simple and allows us to keep our semantics; providing benefits to existing users at no cost.

    The plethora of libc bugs surrounding posix_spawn() seem likely to persist within various environments in the world for years to come. No sense in us waiting for that to settle.

    As for your PR... a configure check for vfork, a news entry, and whatever other documentation updates seem appropriate.

    With this in place we may want to make the _use_posix_spawn() logic in subprocess.py stricter? That could be its own followup PR.

    gpshead commented 5 years ago

    This is a scary issue. But I think a reasonable approach could be to never use vfork when running as whatever we choose to define a "privileged user" to be.

    getuid() or geteuid() return 0? don't use vfork.

    the concept of "privileged user" can obviously mean a lot more than that and likely goes beyond what we should attempt to ascertain ourselves.

    How about also providing a disable-only global setting so that someone writing code they consider to have elevated privileges can prevent its use entirely. subprocess.disable_use_of_vfork() and subprocess.is_vfork_enabled() calls perhaps (just setting/reading a static int vfork_disallowed = 0 flag within _posixsubprocess.c).

    If we did that, on systems where posixspawn() _might be implemented using vfork() we'd want to avoid using it based on is_vfork_enabled().

    True setuid vs vfork attack security would suggest code needs to opt-in to vfork() or posix_spawn() rather than opt-out. Which would destroy the benefit for most users (who won't bother) for the sake of an issue that just is not what most code ever does (setuid/setgid/etc. calls are very uncommon for most software).

    I think documenting "HEY, if you are running as with elevated privileges, here's a reason why you might want to disable vfork, and how to do it." should be enough. Hopefully not famous last words.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

    Thank you for the review and your thoughts, Gregory.

    With this in place we may want to make the _use_posix_spawn() logic in subprocess.py stricter? That could be its own followup PR.

    Yes, I think we can always use vfork() on Linux unless we find some issues. (Victor Stinner doesn't seem to be against: https://github.com/python/cpython/pull/11668#issuecomment-457637207). Though C libraries are in a better position to provide safe vfork/exec, by using our implementation we will:

    This is a scary issue. But I think a reasonable approach could be to never use vfork when running as whatever we choose to define a "privileged user" to be.

    getuid() or geteuid() return 0? don't use vfork.

    I thought about something like this, but initially dismissed it because I (mistakenly) decided that an application may concurrently switch between arbitrary uids back and forth, so that checking getuid() becomes useless (if we're already talking about an exotic app that calls setuid() concurrently with spawning children, why stop there?). But now I realize that an app may use *arbitrary uids only if one of its real, effective or saved uids is zero (that is, if we don't take capabilities into account), so at least we could call getresuid() to get all 3 uids in a race-free way and check whether the app is *definitely privileged...

    the concept of "privileged user" can obviously mean a lot more than that and likely goes beyond what we should attempt to ascertain ourselves.

    ...but you're making a good point here. So I'm not sure that we want such checks, but if we do, we'd probably need to allow users to disable them -- what if some heavy privileged daemon wants a faster subprocess?

    How about also providing a disable-only global setting so that someone writing code they consider to have elevated privileges can prevent its use entirely. subprocess.disable_use_of_vfork() and subprocess.is_vfork_enabled() calls perhaps (just setting/reading a static int vfork_disallowed = 0 flag within _posixsubprocess.c).

    I think it's reasonable. I'll look into it when I'm done with current issues.

    True setuid vs vfork attack security would suggest code needs to opt-in to vfork() or posix_spawn() rather than opt-out. Which would destroy the benefit for most users (who won't bother) for the sake of an issue that just is not what most code ever does (setuid/setgid/etc. calls are very uncommon for most software).

    Agree, especially since we're not talking about "just" using setuid() but rather about using setuid() *in a multithreaded process in a racy manner while spawning children*. Honestly, I'm not sure such apps can blame Python if they get a security hole :)

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

    I've been struggling with fixing spurious -Wclobbered GCC warnings. Originally, I've got the following:

    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c: In function ‘subprocess_fork_exec’:
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:612:15: warning: variable ‘gc_module’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         PyObject *gc_module = NULL;
                   ^~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:616:15: warning: variable ‘preexec_fn_args_tuple’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         PyObject *preexec_fn_args_tuple = NULL;
                   ^~~~~~~~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:621:17: warning: variable ‘cwd’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         const char *cwd;
                     ^~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:623:9: warning: variable ‘need_to_reenable_gc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         int need_to_reenable_gc = 0;
             ^~~~~~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:624:38: warning: variable ‘argv’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
                                          ^~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:624:59: warning: variable ‘envp’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
                                                               ^~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:626:9: warning: variable ‘need_after_fork’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         int need_after_fork = 0;
             ^~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:627:9: warning: variable ‘saved_errno’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
         int saved_errno = 0;
             ^~~~~~~~~~~

    I've checked that all warnings are spurious: all flagged variables are either not modified in the child or modified and used only by the child. A simple way to suppress the warnings would be "volatile", but I don't want to spray "volatile" over the huge declaration block of subprocess_fork_exec().

    Another way is to move vfork() to a separate function and ensure that this function does as little as possible with its local variables. I've implemented two versions of this approach, both are ugly in some sense. I've pushed the first into the PR branch and the second into a separate branch https://github.com/izbyshev/cpython/tree/single-do-fork-exec.

    Yet another way would be to simply disable this diagnostic for _posixsubprocess (e.g. via #pragma GCC diagnostic), but I'm not sure we want to do that -- may be it'll be fixed in the future or a real defect will be introduced into our code.

    20032698-9ec0-4fe7-bf1f-f5432ad88e76 commented 4 years ago

    With bpo-35537 merged, do we have any intention to get on with this? From what I can tell, it provides roughly the same performance benefit - but works also with close_fds=True, which is the default of Popen, so IMO it's a worthy addition.

    I tested a rebase on current master, test_subprocess passes on my Linux box and there are no more -Wclobbered warnings after Alexey's latest change of the do_fork_exec() helper.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    I'd really like to get this merged eventually because vfork()-based solution is fundamentally more generic than posix_spawn(). Apart from having no issue with close_fds=True, it will also continue to allow subprocess to add any process context tweaks without waiting for availability of corresponding features in posix_spawn().

    If there is no objection from Gregory or other devs, I can pick up from where I left in two weeks.

    gpshead commented 4 years ago

    No objections, it would be great to see this finished up and land.

    I've encountered a minority of users who are using a wrapped vfork-based C/C++ library for process spawning as fork+exec isn't fast enough for them.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    Well, much later than promised, but I'm picking it up. Since in the meantime support for setting uid/gid/groups was merged, and I'm aware about potential issues with calling corresponding C library functions in a vfork()-child, I asked a question on musl mailing list: https://www.openwall.com/lists/musl/2020/10/12/1

    So, it seems we'll need to fallback to fork() if set*id() is needed, which is in line with our previous discussion about avoidance of vfork() in privileged processes anyway.

    I'm also discussing -Wclobbered warnings with a GCC developer. I wouldn't like to restructure code just to avoid GCC false positives, so currently I'm leaning towards disabling this warning entirely for subprocess_fork_exec() and documenting that arbitrary stores to local variables between vfork() and child_exec() are not allowed due to stack sharing, but we'll see if a better solution emerges.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    I've updated my PR.

    Potential improvements:

    Potential future directions:

    gpshead commented 4 years ago

    New changeset 976da903a746a5455998e9ca45fbc4d3ad3479d8 by Alexey Izbyshev in branch 'master': bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe (GH-11671) https://github.com/python/cpython/commit/976da903a746a5455998e9ca45fbc4d3ad3479d8

    gpshead commented 4 years ago

    now waiting to see how happy all of the buildbots are...

    We currently have a __linux__ check in the code deciding VFORK_USABLE.

    From what I can tell, vfork probably also works on macOS (darwin).

    Lets let this run for a bit on Linux and it can be a separate issue to open vfork usage up to other platforms.

    gpshead commented 4 years ago
    • To avoid repeating long parameter lists in several functions, we can move them to a struct. The downside is that we'd need to convert child_exec() to use that struct all over the place. I don't have a strong preference here.

    Agreed that the long parameter lists are annoying. I don't like shoving that much on the stack and by adding an additional call with so many params we've increase stack usage. I consider this refactoring work that can be done on its own outside of this issue though.

    • Are any documentation changes needed? We don't change any interfaces, so I'm not sure.

    I don't think so. I looked things over and I think all that is needed is the existing Misc/NEWS entry.

    Potential future directions:

    • If desired, a vfork() opt-out may be implemented. But it'd need to disable posix_spawn() on Linux as well, since it might be implemented via vfork(), and we have no good way to check that.

    We have such an opt-out capabilty for posix_spawn via subprocess._USE_POSIX_SPAWN https://github.com/python/cpython/blob/master/Lib/subprocess.py#L651 along with code in there that determines the default value based on the detected runtime environment.

    I'm not sure we'll need that for vfork() as it seems to pretty much be a low level raw system call wrapper rather than something to be implemented via libc that can have its own bugs. If we do wind up wanting one, I'd structure it the same way.

    gpshead commented 4 years ago

    Thank you for taking this on! I'm calling it fixed for now as the buildbots are looking happy with it. If issues with it arise we can address them.

    ronaldoussoren commented 4 years ago

    From what I can tell, vfork probably also works on macOS (darwin).

    Lets let this run for a bit on Linux and it can be a separate issue to open vfork usage up to other platforms.

    I'd prefer to not use vfork on macOS. For one I don't particularly trust that vfork would work reliably when using higher level APIs, but more importantly posix_spawn on macOS has some options that are hard to achieve otherwise and might be useful to expose in subprocess. An example of this is selecting the CPU architecture to use for the launched process when using fat binaries and a system that supports CPU emulation, such as the intel emulation on the upcoming Apple Silicon systems.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    Thank you for taking this on! I'm calling it fixed for now as the buildbots are looking happy with it. If issues with it arise we can address them.

    Thank you for reviewing and merging!

    Using POSIX_CALL for pthread_sigmask() is incorrect, however, since it *returns* the error instead of setting errno. I've opened a PR with a fix and additional handling of the first pthread_sigmask() call in the parent (which can be done easily).

    I've also noticed a memory leak why doing the above: cwd_obj2 is not released on the error path. It probably slipped with patches adding user/groups support. I'll file a separate issue.

    Would you also clarify why you disallowed vfork() in case setsid() is needed? Have you discovered any potential issues with setsid()? Note that setsid() doesn't suffer from thread sync issues like setuid()/setgid()/etc.

    gpshead commented 4 years ago

    New changeset 473db47747bb8bc986d88ad81799bcbd88153ac5 by Alexey Izbyshev in branch 'master': bpo-35823: subprocess: Fix handling of pthread_sigmask() errors (GH-22944) https://github.com/python/cpython/commit/473db47747bb8bc986d88ad81799bcbd88153ac5

    gpshead commented 4 years ago

    regarding excluding the setsid() case: I was being conservative as I couldn't find a reference of what was and wasn't allowed after vfork.

    I found one thing suggesting that on macOS setsid() was not safe after vfork(). But that appeared to be a Darwin-ism. I expect that is not true on Linux as it should just be a syscall updating a couple of fields in the process info. Confirming, in glibc is appears to be a shim for the setsid syscall (based on not finding any code implementing anything special for it) and in uclibc (much easier to read) it is clearly just a setsid syscall shim.

    I'll make a PR to undo the setsid restriction given we're Linux only.

    gpshead commented 4 years ago

    New changeset be3c3a0e468237430ad7d19a33c60d306199a7f2 by Gregory P. Smith in branch 'master': bpo-35823: Allow setsid() after vfork() on Linux. (GH-22945) https://github.com/python/cpython/commit/be3c3a0e468237430ad7d19a33c60d306199a7f2

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    regarding excluding the setsid() case: I was being conservative as I couldn't find a reference of what was and wasn't allowed after vfork.

    Yes, there is no list of functions allowed after vfork(), except for the conservative POSIX.1 list consisting only of _exit() and execve(), so we can only take async-signal-safe functions as a first approximation and work from there. Thankfully, on Linux, C libraries don't do anything fancy in most cases. But, for example, it appears that on FreeBSD we wouldn't be able to use sigprocmask()/sigaction()[1]. BTW, commit[2] and the linked review are an interesting reading for anybody who would like to support posix_spawn() and/or vfork() in subprocess on FreeBSD.

    Confirming, in glibc is appears to be a shim for the setsid syscall (based on not finding any code implementing anything special for it) and in uclibc (much easier to read) it is clearly just a setsid syscall shim.

    I also recommend musl[3] when simplicity (and correctness) is required :)

    [1] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup&pathrev=362111#l126 [2] https://svnweb.freebsd.org/base?view=revision&revision=352712 [3] https://git.musl-libc.org/cgit/musl/tree/src/unistd/setsid.c?id=a5aff1972c9e3981566414b09a28e331ccd2be5d

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 4 years ago

    @ronaldoussoren

    I'd prefer to not use vfork on macOS. For one I don't particularly trust that vfork would work reliably when using higher level APIs, but more importantly posix_spawn on macOS has some options that are hard to achieve otherwise and might be useful to expose in subprocess.

    I can't comment on vfork() usability on macOS myself, but given the number of issues/considerations described here, I expect that significant research would be needed to check that.

    Regarding your second point about extra posix_spawn() options, I actually don't see why it would be an argument against vfork(). Even on Linux, subprocess prefers posix_spawn() to vfork()/fork() when possible, so vfork() support doesn't hinder posix_spawn().

    gpshead commented 4 years ago

    Nice links. LOL, yes, musl source was going to be my next stop if the larger libc sources proved impossible for a mere mortal to reason about. :)

    regarding macOS, agreed. If someone needs vfork() to work there, I believe it could be made to.

    Options like selecting the architecture of the child process could be higher level options to the subprocess API. Hiding the platform specific details of how that happens and deciding which underlying approach to use based on the flags.

    multi-arch systems are a thing. It is conceivable that we may even see non-esoteric multi-arch hardware at some point.

    gpshead commented 4 years ago

    Performance improvement measured on a 1.4Ghz quad aarch64 A57 (nvidia jetson nano):

    #define VFORK_USABLE 1
     test_subprocess: 36.5 seconds

    undef VFORK_USABLE

    test_subprocess: 45 seconds

    Nice. I really didn't expect a 20% speedup on its testsuite alone!

    Lets dive into that with a microbenchmark:

    $ ./build-novfork/python ten_seconds_of_truth.py
    Launching /bin/true for 10 seconds in a loop.
    Launched 2713 subprocesses in 10.00194378796732 seconds.
    271.247275281014 subprocesses/second.
    Increased our mapped pages by 778240KiB.
    Launching /bin/true for 10 seconds in a loop.
    Launched 212 subprocesses in 10.006392606999725 seconds.
    21.186456331095847 subprocesses/second.
    
    $ ./build/python ten_seconds_of_truth.py
    Launching /bin/true for 10 seconds in a loop.
    Launched 3310 subprocesses in 10.001623224001378 seconds.
    330.94628000551285 subprocesses/second.
    Increased our mapped pages by 778240KiB.
    Launching /bin/true for 10 seconds in a loop.
    Launched 3312 subprocesses in 10.001519071985967 seconds.
    331.1496959773679 subprocesses/second.

    Demonstrating perfectly the benefit of vfork(). The more mapped pages, the slower regular fork() becomes. With vfork() the size of the process's memory map does not matter.

    ten_seconds_of_truth.py:

    from time import monotonic as now
    import subprocess
    
    def benchmark_for_ten():
        print('Launching /bin/true for 10 seconds in a loop.')
    
        count = 0
        start = now()
        while now() - start < 10.:
            subprocess.run(['/bin/true'])
            count += 1
        end = now()
        duration = end-start
    
        print(f'Launched {count} subprocesses in {duration} seconds.')
        print(f'{count/duration} subprocesses/second.')
    
    benchmark_for_ten()
    
    map_a_bunch_of_pages = '4agkahglahaa^#\0ag3\3'*1024*1024*40
    
    print(f'Increased our mapped pages by {len(map_a_bunch_of_pages)//1024}KiB.')
    
    benchmark_for_ten()
    01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 2 years ago

    See bpo-47245.

    https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.c#L309

    In short - do not use vfork(). Use clone(CLONE_VM | CLONE_VFORK). And build separate stack.

    Current implementation is heavily broken.

    Another guy has failed: https://bugzilla.kernel.org/show_bug.cgi?id=215813.

    Please comment calling vfork() as soon as possible in order to fallback to fork() implementation. Until you (or me?) write good vfork children code.

    9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 2 years ago

    The preceding comment is wrong, see discussion in bpo-47245 and https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 for explanation of why that bug report is irrelevant for CPython.

    01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 2 years ago

    Yes, you are almost right. Error-path is not so clear (it is discussed in another issue), but in general, yes, my previous comment is wrong. So, finally, there are no bugs around the stack at all.