python / cpython

The Python programming language
https://www.python.org
Other
63.8k stars 30.55k forks source link

subprocess terminate() "invalid handle" error when process is gone #80248

Open giampaolo opened 5 years ago

giampaolo commented 5 years ago
BPO 36067
Nosy @pfmoore, @vstinner, @giampaolo, @tjguk, @zware, @eryksun, @zooba, @izbyshev

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', '3.9', '3.10', 'type-feature', 'library', 'OS-windows'] title = 'subprocess terminate() "invalid handle" error when process is gone' updated_at = user = 'https://github.com/giampaolo' ``` bugs.python.org fields: ```python activity = actor = 'eryksun' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'Windows'] creation = creator = 'giampaolo.rodola' dependencies = [] files = [] hgrepos = [] issue_num = 36067 keywords = [] message_count = 9.0 messages = ['336231', '336235', '336236', '336241', '336243', '336262', '336272', '336339', '388070'] nosy_count = 8.0 nosy_names = ['paul.moore', 'vstinner', 'giampaolo.rodola', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'izbyshev'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue36067' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

giampaolo commented 5 years ago

Happened in psutil: https://ci.appveyor.com/project/giampaolo/psutil/builds/22546914/job/rlp112gffyf2o30i

\====================================================================== ERROR: psutil.tests.test_process.TestProcess.test_halfway_terminated_process ----------------------------------------------------------------------

Traceback (most recent call last):
  File "c:\projects\psutil\psutil\tests\test_process.py", line 85, in tearDown
    reap_children()
  File "c:\projects\psutil\psutil\tests\__init__.py", line 493, in reap_children
    subp.terminate()
  File "C:\Python35-x64\lib\subprocess.py", line 1092, in terminate
    _winapi.TerminateProcess(self._handle, 1)
OSError: [WinError 6] The handle is invalid

During the test case, the process was already gone (no PID).

vstinner commented 5 years ago

I'm not sure of the purpose of this issue.

It's expected to get an error if you try to send a signal to process which is already terminated.

vstinner@apu$ python3
Python 3.7.2 (default, Jan 16 2019, 19:49:22) 
>>> import subprocess
>>> proc=subprocess.Popen("/bin/true")
>>> import os
>>> os.waitpid(proc.pid, 0)
(8171, 0)
>>> proc.kill()
ProcessLookupError: [Errno 3] No such process
>>> proc.terminate()
ProcessLookupError: [Errno 3] No such process

Ignoring these errors would be very risky: if another process gets the same pid, you would send a signal to the wrong process. Ooops.

If you only use the subprocess API, you don't have this issue:

vstinner@apu$ python3
Python 3.7.2 (default, Jan 16 2019, 19:49:22) 
>>> import subprocess
>>> proc=subprocess.Popen("/bin/true")
>>> proc.wait()
0
>>> proc.kill() # do nothing
>>> proc.terminate() # do nothing
giampaolo commented 5 years ago

I think this is somewhat similar to bpo-14252. The problem I see is that we should either raise ProcessLookupError or ignore the error (better). This concept of suppressing errors if process is gone is currently already established in 2 places: https://github.com/python/cpython/blob/bafa8487f77fa076de3a06755399daf81cb75598/Lib/subprocess.py#L1389 Basically what I propose is to extend the existing logic to also include ERROR_INVALID_HANDLE other than ERROR_ACCESS_DENIED. Not tested:

def terminate(self):
    """Terminates the process."""
    # Don't terminate a process that we know has already died.
    if self.returncode is not None:
        return
    try:
        _winapi.TerminateProcess(self._handle, 1)
    except WindowsError as err:
        if err.errno in (ERROR_ACCESS_DENIED, ERROR_INVALID_HANDLE):
            rc = _winapi.GetExitCodeProcess(self._handle)
            if rc == _winapi.STILL_ACTIVE:
                raise
            self.returncode = rc
        else:
            raise
vstinner commented 5 years ago

_winapi.TerminateProcess(self._handle, 1) OSError: [WinError 6] The handle is invalid

Silently ignoring that would be dangerous.

There is a risk that the handle is reused by another process and so that you terminate an unrelated process, no?

giampaolo commented 5 years ago

On POSIX there is that risk, yes. As for Windows, the termination is based on the process handle instead of the PID, but I am not sure if that makes a difference. The risk of reusing the PID/handle is not related to this issue though. The solution I propose just ignores ERROR_INVALID_HANDLE and makes it an alias for "process is already gone". It does not involve preventing the termination of other process handles/PIDs (FWIW in order to do that in psutil I use PID + process' creation time to identify a process uniquely).

eryksun commented 5 years ago

The subprocess Handle object is never finalized explicitly, so the Process handle should always be valid as long as we have a reference to the Popen instance. We can call TerminateProcess as many times as we want, and the handle will still be valid. If it's already terminated, NtTerminateProcess fails with STATUS_PROCESS_IS_TERMINATING, which maps to Windows ERROR_ACCESS_DENIED.

If some other code mistakenly called CloseHandle on our handle. This is a serious bug that should never be silenced and always raise an exception if we can detect it.

If the handle value hasn't been reused, NtTerminateProcess fails with STATUS_INVALID_HANDLE. If it now references a non-Process object, it fails with STATUS_OBJECT_TYPE_MISMATCH. Both of these map to Windows ERROR_INVALID_HANDLE. If the handle value was reused by a Process object (either via CreateProcess[AsUser] or OpenProcess) that lacks the PROCESS_TERMINATE (1) right (cannot be our original handle, since ours had all access), then it fails with STATUS_ACCESS_DENIED, which maps to Windows ERROR_ACCESS_DENIED. Otherwise if it has the PROCESS_TERMINATE right, then currently we'll end up terminating an unrelated process. As mentioned by Giampaolo, we could improve our chances of catching this bug by first verifying the PID via GetProcessId and the creation time from GetProcessTimes. We'd also have to store the creation time in _execute_child. Both functions would have to be added to _winapi.

The solution I propose just ignores ERROR_INVALID_HANDLE and makes it an alias for "process is already gone".

If we get ERROR_INVALID_HANDLE, we should not try to call GetExitCodeProcess. All we know is that it either wasn't a valid handle or was reused to reference a non-Process object. Maybe by the time we call GetExitCodeProcess it has since been reused again to reference a Process. That would silence the error and propagate a bug by setting an unrelated exit status. Otherwise, GetExitCodeProcess will just fail again with ERROR_INVALID_HANDLE. There's no point to this, and it's potentially making the problem worse.

---

Also, unrelated but something I noticed. Using _active list in Windows shouldn't be necessary. Unlike Unix, a process in Windows doesn't have to be waited on by its parent to avoid a zombie. Keeping the handle open will actually create a zombie until the next _cleanup() call, which may be never if Popen() isn't called again.

giampaolo commented 5 years ago

Interesting. Because both errors/conditions are mapped to ERROR_INVALID_HANDLE we need the creation time. I can work on a patch for that. Potentially I can also include OSX, Linux and BSD* implementations for methods involving os.kill(pid). That would be a broader task though. That also raises the question if there are other methods other than kill()/terminate()/send_signal() that we want to make "safe" from the reused PID scenario.

Also, unrelated but something I noticed. Using _active list in Windows shouldn't be necessary. Unlike Unix, a process in Windows doesn't have to be waited on by its parent to avoid a zombie. Keeping the handle open will actually create a zombie until the next _cleanup() call, which may be never if Popen() isn't called again.

Good catch. Looks like it deserves a ticket.

9b2dd46a-d3b9-42c4-96ed-ebdd39741ebb commented 5 years ago

Interesting. Because both errors/conditions are mapped to ERROR_INVALID_HANDLE we need the creation time. I can work on a patch for that.

I don't understand why any patch for CPython is needed at all. Using invalid handles is a serious programming bug (it's similar to using freed memory), so CPython doesn't really have to attempt to detect the programmer's error, at least not if this attempt significantly complicates the existing code.

In my opinion, the CI failure linked in the first comment simply indicates a bug in psutil and is unrelated to CPython at all.

eryksun commented 3 years ago

I don't understand why any patch for CPython is needed at all.

If and operation on self._handle fails as an invalid handle (due to an underlying STATUS_INVALID_HANDLE or STATUS_OBJECT_TYPE_MISMATCH), then call self._handle.Detach() and re-raise the exception.

It wouldn't hurt to also address the case in which coincidentally the handle value currently references another process. When the process is spawned, save the (process_id, creation_time) via GetProcessId() and GetProcessTimes(). Implement a function that validates these values before calling WaitForSingleObject(), GetExitCodeProcess(), or TerminateProcess(). If validation fails, call self._handle.Detach() and raise OSError.