python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

Add support for making Linux prctl(...) calls to subprocess #86902

Open gpshead opened 3 years ago

gpshead commented 3 years ago
BPO 42736
Nosy @gpshead, @vstinner, @benjaminp, @izbyshev

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', 'type-feature', 'library', '3.10'] title = 'Add support for making Linux prctl(...) calls to subprocess' updated_at = user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = actor = 'gregory.p.smith' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules', 'Library (Lib)'] creation = creator = 'gregory.p.smith' dependencies = [] files = [] hgrepos = [] issue_num = 42736 keywords = [] message_count = 6.0 messages = ['383723', '383757', '383866', '383869', '383872', '383884'] nosy_count = 4.0 nosy_names = ['gregory.p.smith', 'vstinner', 'benjamin.peterson', 'izbyshev'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue42736' versions = ['Python 3.10'] ```

gpshead commented 3 years ago

Another use of subprocess preexec_fn= that I've come across has been to call Linux's prctl in the child process before the exec.

_libc.prctl(_PR_SET_PDEATHSIG, value) for example.

Adding a linux_prctl_calls= parameter listing information about which prctl call(s) to make in the child before exec would satisfy this needed.

No need to go overboard here, this is a _very_ low level syscall. users need to know what they're doing and will likely abstract use away from actual users via a wrapper library. i.e: Lets not attempt to reinvent the https://pythonhosted.org/python-prctl/ interface.

Proposal:

linux_prctl_calls: Sequence[tuple]

Where every tuple indicates one prctl call. the tuple [0] contains a bool indicating if an error returned by that prctl call should be ignored or not. the tuple[1:6] contain the five int arguments to be passed to prctl. If the tuple is shorter than 2 elements, the remaining values are assumed 0.

At most, a namedtuple type could be created for this purpose to allow the user to use the https://man7.org/linux/man-pages/man2/prctl.2.html argument names rather than just blindly listing a tuple.

namedtuple('LinuxPrctlDescription',
            field_names='check_error, option, arg2, arg3, arg4, arg5',
            defaults=(0,0,0,0))

This existing helps https://bugs.python.org/issue38435 deprecate preexec_fn.

benjaminp commented 3 years ago

I wonder if a dedicated datatype should be created for all os-specific parameters like https://golang.org/pkg/syscall/#SysProcAttr. Popen already has way too many parameters. And prctl is a very general interface; probably 98% of prctls would never need to be called pre-exec.

(Separately, os.prctl should be created to expose prctl in all its multiplexed glory?)

(Also, but PDEATHSIG has an infamous footgun where the the signal is sent on exit of the forking thread, which is not necessarily the exit of the invoking process.)

gpshead commented 3 years ago

Thanks for the golang SysProcAttr reference. The internals of _posixsubprocess have already becoming unwieldy with the abundance of args. Such a struct/object would also fit in well there and avoid excessive C stack use. (as izbyshev noted during the vfork work)

Most prctl uses I noticed were PDEATHSIG but I'd need to explicitly audit those. Users don't seem to care about it's documented main thread caveat (which matches what I've seen; most programs don't use non-daemon threads and exit the main thread).

I want what we do for this to be futureproof for the syscall so that we don't wind up merely picking one feature such as PDEATHSIG to pass a flags through to and needing to add logic to support others later on, delaying the ability to use new system features.

vstinner commented 3 years ago

I agree that it would be nice to pack these parameters, similar to the Windows STARTUPINFO. The subprocess.Popen constructor has more and more parameter for functions executed between fork and exec:

    def __init__(self, args, bufsize=-1, executable=None,
                 stdin=None, stdout=None, stderr=None,
                 preexec_fn=None, close_fds=True,
                 shell=False, cwd=None, env=None, universal_newlines=None,
                 startupinfo=None, creationflags=0,
                 restore_signals=True, start_new_session=False,
                 pass_fds=(), *, user=None, group=None, extra_groups=None,
                 encoding=None, errors=None, text=None, umask=-1, pipesize=-1):

Parameters:

Idea of API: -------------

preexec = subprocess.Preexec()
preexec.setsid()
preexec.chdir(path)

popen = subprocess.Popen(cmd, preexec=preexec)
popen.wait()

It would make error reporting more helpful. For example, if the path type is not bytes or str, preexec.chdir(path) call would raise an error, rather than getting an error in the complex Popen constructor.

benjaminp commented 3 years ago

On Sun, Dec 27, 2020, at 14:53, Gregory P. Smith wrote:

Most prctl uses I noticed were PDEATHSIG but I'd need to explicitly audit those. Users don't seem to care about it's documented main thread caveat (which matches what I've seen; most programs don't use non-daemon threads and exit the main thread).

It works great until someone refactors their process-launching code to be asynchronous. Anyway, I don't mean to bog this discussion down in the advisability and utility of PDEATHSIG. Clearly, it needs to be supported to remove even less advisable functionality.

I want what we do for this to be futureproof for the syscall so that we don't wind up merely picking one feature such as PDEATHSIG to pass a flags through to and needing to add logic to support others later on, delaying the ability to use new system features.

The proposal right now feels like overgeneralization leading to an icky interface. It seems in spirit no different form providing a similar interface to syscall(3). At some point the interface will become so general it defeats the initial purpose of introduction, to disallow arbitrary code execution before execve. There will always be new syscalls, multiplexed into prctl/ioctl or not, that people want to make before execution. The universal workaround of a wrapper program can satisfy those on the vanguard.

gpshead commented 3 years ago

Felt and understood.

The plethora of things to do between (v)fork+exec really makes me wish for a "little" eBPF interpreter rather than needing so much specific plumbing. But that'd have the same problem as preexec_fn: in absence of something that declares an operation safe or not, it'd open the door to unsafe and leave us in no better state than preexec_fn.

gpshead commented 1 month ago

I'm encountering more uses of preexec_fn= for libc.prctl(PR_SET_PDEATHSIG, signal_number); I plan to go ahead and add something to the argument soup for this one very specific case given I've seen it in multiple places to at least allow this specific use case to stop bleeding. Coming up with a general prctl anything interface would be a rabbit hole as @benjaminp rightly points out. More complicated needs are better addressed by an interposing process wrapper that does the calls and re-execs to the real target. We already have process_group= and start_new_session=, this one goes along with those in terms of repeated utility.

hauntsaninja commented 1 month ago

It's definitely the thing we use preexec_fn for the most at work.