[BUG] stderr duplication failed

rayzchen commented 3 years ago

setuptools version

setuptools==56.0.0

Python version

Python 3.9

OS

Windows

Additional environment information

No response

Description

I use gui_scripts with a blank function. If i use pip install -e . or py setup.py install nothing happens, as expected. If i do pip install . or use setup.py to make a binary dist then install it, i get stderr duplication failed in a prompt. This happens before any code gets run.

Expected behavior

Nothing

How to Reproduce

# setup.py
from setuptools import setup

setup(
    name="test",
    packages=["test"],
    entry_points={
        "gui_scripts": [
            "test-script=test:main"
        ]
    }
)

# test/__init__.py
def main():
    pass

Run above commands

Output

Code of Conduct

[x] I agree to follow the PSF Code of Conduct

jaraco commented 3 years ago

The reason you're seeing this error only in the selected environments (non-editable install) is because only editable installs use Setuptools' launcher scripts. The error you're getting is from the launcher scripts installed by pip and distlib. You'll want to report the issue to the distlib project and see that pip inherits the fix. For that reason, I'll move this issue to pip and I recommend you to file an issue with distlib.

DiddiLeija commented 3 years ago

@rayzchen Is this still happening? Could you provide some information of the pip version you are using?

rayzchen commented 3 years ago

It still happens with pip version 21.2.4

uranusjr commented 3 years ago

cc @vsajip since the launchers are from distlib.

vsajip commented 3 years ago

I can't reproduce this - it might be something specific in @rayzchen 's environment. I'm using Python 3.9.7 on Windows 10.

C:\Users\Vinay\Projects\scratch\distlib\pip_10444> type setup.py                                                                   
from setuptools import setup                                                                                                       

setup(                                                                                                                             
    name="test",                                                                                                                   
    packages=["test"],                                                                                                             
    entry_points={                                                                                                                 
        "gui_scripts": [                                                                                                           
            "test-script=test:main"                                                                                                
        ]                                                                                                                          
    }                                                                                                                              
)                                                                                                                                  
C:\Users\Vinay\Projects\scratch\distlib\pip_10444> type test\__init__.py                                                           
def main():                                                                                                                        
    pass                                                                                                                           
C:\Users\Vinay\Projects\scratch\distlib\pip_10444> python3 -m venv env                                                             

C:\Users\Vinay\Projects\scratch\distlib\pip_10444> env\Scripts\python -m pip install -U pip wheel                                  
Requirement already satisfied: pip in c:\users\vinay\projects\scratch\distlib\pip_10444\env\lib\site-packages (21.2.3)             
Collecting pip                                                                                                                     
  Using cached pip-21.2.4-py3-none-any.whl (1.6 MB)                                                                                
Collecting wheel                                                                                                                   
  Using cached wheel-0.37.0-py2.py3-none-any.whl (35 kB)                                                                           
Installing collected packages: wheel, pip                                                                                          
  Attempting uninstall: pip                                                                                                        
    Found existing installation: pip 21.2.3                                                                                        
    Uninstalling pip-21.2.3:                                                                                                       
      Successfully uninstalled pip-21.2.3                                                                                          
Successfully installed pip-21.2.4 wheel-0.37.0                                                                                     

C:\Users\Vinay\Projects\scratch\distlib\pip_10444> env\Scripts\pip install .                                                       
Processing c:\users\vinay\projects\scratch\distlib\pip_10444                                                                       
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory.
 We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.      
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issue
s/7555.                                                                                                                            
Building wheels for collected packages: test                                                                                       
  Building wheel for test (setup.py) ... done                                                                                      
  Created wheel for test: filename=test-0.0.0-py3-none-any.whl size=1357 sha256=59ca865995d18af94cae5d4bf40bc7643ffa08ff8fdcd29e10b
9a8c940967239                                                                                                                      
  Stored in directory: C:\Users\Vinay\AppData\Local\Temp\pip-ephem-wheel-cache-fir7qy3v\wheels\6f\a7\45\f9fb5d415c64f56b92837eabf4b
edd182318b9abca73407cd5                                                                                                            
Successfully built test                                                                                                            
Installing collected packages: test                                                                                                
Successfully installed test-0.0.0                                                                                                  

C:\Users\Vinay\Projects\scratch\distlib\pip_10444> dir env\Scripts\                                                                
 Volume in drive C has no label.                                                                                                   
 Volume Serial Number is 26B4-06F0                                                                                                 

 Directory of C:\Users\Vinay\Projects\scratch\distlib\pip_10444\env\Scripts                                                        

25/09/2021  06:49    <DIR>          .                                                                                              
25/09/2021  06:49    <DIR>          ..                                                                                             
25/09/2021  06:48             1,997 activate                                                                                       
25/09/2021  06:48               989 activate.bat                                                                                   
25/09/2021  06:48            19,408 Activate.ps1                                                                                   
25/09/2021  06:48               368 deactivate.bat                                                                                 
25/09/2021  06:49           106,383 pip.exe                                                                                        
25/09/2021  06:49           106,383 pip3.9.exe                                                                                     
25/09/2021  06:49           106,383 pip3.exe                                                                                       
25/09/2021  06:48           543,464 python.exe                                                                                     
25/09/2021  06:48           542,440 pythonw.exe                                                                                    
25/09/2021  06:49           100,222 test-script.exe                                                                                
25/09/2021  06:48           106,370 wheel.exe                                                                                      
              11 File(s)      1,634,407 bytes                                                                                      
               2 Dir(s)  11,182,489,600 bytes free                                                                                 

C:\Users\Vinay\Projects\scratch\distlib\pip_10444> env\Scripts\test-script.exe                                                     

C:\Users\Vinay\Projects\scratch\distlib\pip_10444>

And no error dialog box comes up. If I additionally install pywin32, change the test application to the following:

# setup.py
from setuptools import setup

setup(
    name="test",
    packages=["test"],
    entry_points={
        "gui_scripts": [
            "test-script=test:main",
            "hello=test:hello"
        ]
    }
)

# test\__init__.py
def main():
    pass

def hello():
    import win32api, win32con
    win32api.MessageBox (0, "Test application says 'Hello, world!'", 'Hello, world!', win32con.MB_OK)
    pass

and reinstall using env\Scripts\pip install . followed by running env\Scripts\hello, I get the expected message box: ss02

rayzchen commented 3 years ago

I tried the exact same thing that you did and I still got the error. What does the error actually mean? I'm using python 3.9.4, and I upgraded setuptools to 57.0.0.

rayzchen commented 3 years ago

sys.version is 3.9.4 (tags/v3.9.4:1f2e308, Apr 6 2021, 13:22:44) [MSC v.1928 32 bit (Intel)]

vsajip commented 3 years ago

What does the error actually mean?

It means that the Windows DuplicateHandle API failed (returned an error) for the stderr process stream, and the launcher bails out because that happened. The launcher calls code to duplicate handles for stdin, stdout and stderr in that order; I don't know why the duplication of stderr is failing for you where stdin and stdout apparently succeed just before that. According to the linked Windows documentation,

The handle must have the PROCESS_DUP_HANDLE access right.

I have no way of telling if that's the case in your environment, of course. What Windows version is it where the failure occurs? That might be relevant.

rayzchen commented 3 years ago

if this helps

vsajip commented 3 years ago

Can you please confirm that it's running on a 64-bit processor? Can you please try with a 64-bit Python? Your version string seems to indicate a 32-bit Python.

vsajip commented 3 years ago

@rayzchen I used the script in this Gist to see if I could duplicate handles on my system using ctypes. When I ran it, I got

1
1
1

Indicating that stdin, stdout and stderr were successfully duplicated. You might want to try it, but make sure you're OK with what it does first.

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

jeremyd2019 commented 3 years ago

This was reported at msys2/MSYS2-packages#2694. Your handles.py script returns 1 1 1 when run via python, but if I modify the script to use MessageBoxW instead of print, and run via pythonw instead, I got 0 0 0.

Outputting the handle values, input and output are 0, and only error is an actual handle.

def MessageBox(message):
    return windll.user32.MessageBoxW(0, str(message), "Your title", 1)

The last error from DuplicateHandle is 6 (ERROR_INVALID_HANDLE) for in and out, and 50 (ERROR_NOT_SUPPORTED) for error.

jeremyd2019 commented 3 years ago

When run from cmd.exe, all 3 handles are 0, and all 3 DuplicateHandle calls error with ERROR_INVALID_HANDLE)

vsajip commented 3 years ago

Not sure exactly what you mean by "run from cmd.exe" - I originally ran the handles.py from a cmd.exe window to get the 1 1 1 result. It's possible that a windowed application like pythonw doesn't have these handles because it doesn't do terminal I/O. It's not clear why only stderr would be affected.

jeremyd2019 commented 3 years ago

Sorry. The report to MSYS2 was that they got this error from a GUI launcher when launched from Git for Windows Bash but not when launched from cmd or powershell. I was testing using MSYS2 (Git for Windows uses a fork of MSYS2, and in turn MSYS2 is a fork of Cygwin).

rayzchen commented 3 years ago

So you're saying it only happens on MSYS and consequently on git bash?

jeremyd2019 commented 3 years ago

It probably would also happen on Cygwin, but I haven't tried that. @cbrnr the reporter from MSYS2-packages might be able to answer better as to the real-world implications. I'm just looking into the technical side to see if something needs to be fixed in MSYS2.

cbrnr commented 3 years ago

Thanks for reviving this issue!

It is correct that everything works as expected on cmd.exe and powershell.exe, but not on Git for Windows Bash. I haven't tested MSYS2 or Cygwin, but it is likely that it doesn't work either.

The real-world implications are quite severe in my opinion. Every Python package that provides a command line shortcut (located in the Scripts folder of the Python installation) which uses the gui_script entry point cannot be started when using Git for Windows Bash (for example my mnelab package is affected, but I'm sure there are many more popular ones). I'm assuming that at least a few people will be using that shell on Windows, because it is the default selection when installing Git for Windows.

jeremyd2019 commented 3 years ago

My testing has revealed that it works fine (all 3 handles are actually valid) when run in MSYS2 via MinTTY, but not via "Windows default console"

cbrnr commented 3 years ago

But that's not what I'm seeing. I have set it to the first option (MinTTY), which does not work. I haven't tried the second option though, but according to the messages in that dialog it should work with cmd.exe, which is what I'm seeing. Note that I'm using Windows Terminal with the shell set to "%PROGRAMFILES%\git\usr\bin\bash.exe" -i -l in the profile.

cbrnr commented 3 years ago

And according to the message, the command actually works when started with winpty, e.g. winpty mnelab for my mnelab package.

pradyunsg commented 3 years ago

Can someone provide step-by-step instructions to reproduce this error?

cbrnr commented 3 years ago

I've mentioned that in https://github.com/msys2/MSYS2-packages/issues/2694, but here's an ever simpler version:

Install Python (the example below requires 3.9 because some packages are not available for 3.10 yet) and make sure to add it to PATH
Install Git for Windows (with the default settings, specifically the MinTTY option in the dialog mentioned before)
Open Git Bash for Windows
pip install PySide2 mnelab
mnelab

The last step results in the error. If you run mnelab from cmd.exe or powershell.exe (or even winpty mnelab in Git Bash), it works.

uranusjr commented 3 years ago

From the above description, particularly winpty’s involvement, it seems like a problem in MinTTY’s Windows-POSIX stream API interop. A seamless solution is impossible if I understand correctly (which is why winpty exists; if a universal fix is possible, MSYS2 would have just done it directly); the best we can do is to have the launcher detect the situation and point users to winpty.

jeremyd2019 commented 3 years ago

But that's not what I'm seeing. I have set it to the first option (MinTTY), which does not work. I haven't tried the second option though, but according to the messages in that dialog it should work with cmd.exe, which is what I'm seeing. Note that I'm using Windows Terminal with the shell set to "%PROGRAMFILES%\git\usr\bin\bash.exe" -i -l in the profile.

The option in Git for Windows installer controls what the launcher does. Running it in Windows Terminal is effectively the same as the second option. If you instead launched Git for Windows from the start menu option it would run in MinTTY and I think it would work there.

uranusjr commented 3 years ago

Wait, @cbrnr and @jeremyd2019 you two are contradicting each other. We are obviously missing a big piece of the puzzle here; otherwise both of you can’t be right at the same time.

cbrnr commented 3 years ago

I just tried to reproduce this on another Windows machine. I forgot one step in my list (pip install PySide2).

The mnelab command works on cmd.exe and powershell.exe; it doesn't matter if I start these shells with their own executable or whether I use appropriate profiles in Windows Terminal
The mnelab command does not work in Bash when I use Windows Terminal (I get the stderr dialog)
The mnelab command does work in Git Bash directly (which opens a window with /usr/bin/bash --login -i in the title)
In Windows Terminal using the Git Bash profile, the mnelab command works with winpty mnelab

So it looks like this might be a Windows Terminal issue then. Which is strange because I have the command line option set to "%PROGRAMFILES%\git\usr\bin\bash.exe" -i -l - isn't that the same thing? Or do I need to start it differently?

jeremyd2019 commented 3 years ago

When run in Windows Terminal it is using the same mechanisms as the "Default Windows console", while run from mintty it uses different console mechanisms.

uranusjr commented 3 years ago

I’m going to summon @eryksun for expertise on this.

eryksun commented 3 years ago

If bash is executed via "%ProgramFiles%/Git/git-bash.exe", then it uses mintty with standard I/O that's based on named-pipe files (e.g. "//./pipe/msys-<...>-pty0-from-master" and "//./pipe/msys-<...>-pty0-to-master").

OTOH, if bash is run directly via "%ProgramFiles%/Git/bin/bash.exe" or "%ProgramFiles%/Git/usr/bin/bash.exe", then its standard I/O uses ConDrv files, including a "Connect" file for a console connection (used by API functions such as GetConsoleCP), generic "Input" and "Output" files, and the "CurrentIn" (i.e. "CONIN$") and "CurrentOut" (i.e. "CONOUT$) files that attach to the current console. ConDrv files provide the IPC channel to a system console host process (i.e. conhost.exe or openconsole.exe) by way of I/O read/write and IOCTL requests. For a classic console session, the connection looks like <bash>-<condrv>-<conhost>. For a pseudoconsole session, there's a back-end channel over pipes that talks to the real terminal, e.g. <bash>-<condrv>-<openconsole>-<namedpipe>-<windowsterminal>.

What fails for me is the case that uses console I/O. This also fails with the "pyw.exe" launcher, e.g. when running IDLE via pyw -m idlelib. Depending on your point of view, it's a bug in Windows, or Git bash, or the launcher. One can blame the launcher for not being resilient. Currently it ignores only ERROR_INVALID_HANDLE (6). But if one of the standard handles gets closed during process initialization (see details below), the handle value may get reused to reference an object type that doesn't allow handle duplication, such as an "EtwRegistration" object. The error in this case could be ERROR_NOT_SUPPORTED (50), or who knows what else. It isn't documented.

What's happening with bash is that it respawns a child process to run the shell that the user interacts with. This child instance uses the same handle value for its StandardOutput and StandardError, which in this case is a ConDrv "CurrentOut" file. (The parent bash must have opened "CONOUT$" to use as stdout/stderr in the child bash.) When spawning the launcher (e.g. pyw.exe), bash enables handle inheritance, so initially the launcher's standard handle values and console reference/connection handle value (i.e. ConsoleHandle in the process parameters) are copied from bash, including those StandardOutput and StandardError values that happen to be the same handle value.

At process startup, when the process is initialized by the Windows base API (i.e. when kernelbase.dll loads), if it's a GUI executable, its ConsoleHandle value gets cleared, and any standard handles for ConDrv files are closed and cleared. Handles for files on other devices and invalid handle values are left in place. Given the StandardOutput handle is closed and cleared first, the StandardError value will be invalid when checked. Windows leaves this invalid StandardError handle value in the process parameters. Since this occurs early during process initialization, if the handle value is low enough, it will likely get reused by something else long before the launcher tries to duplicate it.

Here are the details in a debugger session:

$ cdb.exe -xe "ld ntdll" "env39\Scripts\mnelab.exe"
[...]
ntdll!RtlUserThreadStart:
00007fff`dd5e2630 4883ec78        sub     rsp,78h
0:000> ?? @$peb->ProcessParameters->ConsoleHandle
void * 0x00000000`00000004
0:000> ?? @$peb->ProcessParameters->StandardInput
void * 0x00000000`000001bc
0:000> ?? @$peb->ProcessParameters->StandardOutput
void * 0x00000000`000001c0
0:000> ?? @$peb->ProcessParameters->StandardError
void * 0x00000000`000001c0

0:000> !handle 0x1c0 3
Handle 1c0
  Type          File
  Attributes    0
  GrantedAccess 0x12019f:
         ReadControl,Synch
         Read/List,Write/Add,Append/SubDir/CreatePipe,
         ReadEA,WriteEA,ReadAttr,WriteAttr
  HandleCount   6
  PointerCount  125154

0:000> bp kernelbase!ConsoleInitialize
Bp expression 'kernelbase!ConsoleInitialize' could not be resolved,
adding deferred bp
0:000> g
ModLoad: 00007fff`dbca0000 00007fff`dbd5e000
            C:\Windows\System32\KERNEL32.DLL
ModLoad: 00007fff`db150000 00007fff`db418000
            C:\Windows\System32\KERNELBASE.dll
Breakpoint 0 hit
KERNELBASE!ConsoleInitialize:
00007fff`db16183c 48895c2418      mov     qword ptr [rsp+18h],rbx
            ss:00000000`008fe800=00007fffdb314201
0:000> pt
KERNELBASE!ConsoleInitialize+0x129:
00007fff`db161965 c3              ret

0:000> ?? @$peb->ProcessParameters->ConsoleHandle
void * 0x00000000`00000000
0:000> ?? @$peb->ProcessParameters->StandardInput
void * 0x00000000`00000000
0:000> ?? @$peb->ProcessParameters->StandardOutput
void * 0x00000000`00000000
0:000> ?? @$peb->ProcessParameters->StandardError
void * 0x00000000`000001c0

0:000> !handle 0x1c0 3
Could not duplicate handle 1c0, error 6

Later the handle value gets reused for a different object type, so the launcher's DuplicateHandle() call for the StandardError handle fails with ERROR_NOT_SUPPORTED (50):

KERNELBASE!DuplicateHandle+0x5e:
00007fff`db1a01ae c3              ret
0:000> ?? @rax == 0
bool true
0:000> ?? @$teb->LastErrorValue == 50
bool true

In a local kernel debugging session, I see that it's now a handle for an "EtwRegistration" object.

lkd> !handle 0x1C0
[...]
01c0: Object: ffffd18f0e80a750  GrantedAccess: 00000804 (Inherit) Entry: ffffaa0d56efd660
Object: ffffd18f0e80a750  Type: (ffffd18f059f2900) EtwRegistration
    ObjectHeader: ffffd18f0e80a720 (new version)
        HandleCount: 1 PointerCount: 32766

uranusjr commented 3 years ago

What fails for me is the case that uses console I/O. This also fails with the "pyw.exe" launcher, e.g. when running IDLE via pyw -m idlelib. Depending on your point of view, it's a bug in Windows, or Git bash, or the launcher. One can blame the launcher for not being resilient. Currently it ignores only ERROR_INVALID_HANDLE (6). But if one of the standard handles gets closed during process initialization (see details below), the handle value may get reused to reference an object type that doesn't allow handle duplication, such as an "EtwRegistration" object. The error in this case could be ERROR_NOT_SUPPORTED (50), or who knows what else. It isn't documented.

From pip’s perspective, the best “fix” is perhaps for the GUI launcher to simply ignore most (all?) stream duplication errors instead of crashing. But pip is likely not the best party to make a decision here and I’ll leave this to @vsajip.

vsajip commented 3 years ago

@eryksun Thanks for the great analysis and explanation. I had an 'aha' moment when I read

Given the StandardOutput handle is closed and cleared first, the StandardError value will be invalid when checked.

I’ll leave this to @vsajip.

OK, what are opinions as to what the best "fix" would be? Try to duplicate the handles, but ignore any and all errors? Avoid duplicating handles altogether for the GUI launcher, since they wouldn't normally be used for console I/O in the GUI case? What would be the legitimate use cases for these handles in the child process spawned from the launcher in the GUI case?

uranusjr commented 3 years ago

The “safer” fix would be to duplicate and ignore the errors. But

What would be the legitimate use cases for these handles in the child process spawned from the launcher in the GUI case?

this is a good question. Personally I don’t see how a GUI app would want those handles, but pip developers are (famously?) bad at understanding users’ use cases 😛

vsajip commented 3 years ago

I've just pushed an update to distlib with updated launchers. Please can interested parties try them out and feedback whether or not they fix the reported problems?

eryksun commented 3 years ago

@vsajip, it looks like the scaled-down launcher is a bit behind the full version. It's missing the GetStartupInfoW() call to copy the launcher's STARTUPINFO to the child. The STARTUPINFO record has the intended window station and desktop (e.g. "WinSta0\Default"), initial window title (which determines console settings), default window position and size, the initial show-window state (minimized, hidden, etc), and the console's screen buffer dimensions and fill attributes. It also has a reserved field (corresponding to ShellInfo in the process parameters) that the shell uses to send data such as the window icon to use, and a second reserved field (corresponding to RuntimeData in the process parameters) that the C runtime uses to pass inherited file descriptors.

Personally I don’t see how a GUI app would want those handles

I suppose there can be legitimate uses of implicitly inherited or explicitly inherited standard handles even in GUI applications. But it's not like the process parameters lack for better and extensible ways to communicate inherited handle values, such as via command-line arguments or environment variables. OTOH, there's something to said for standard values that every process understands.

Given GetStartupInfoW() is called to get the startup flags of the launcher, duplicating standard input should be skipped if the flag STARTF_USEHOTKEY (0x200) is set, and standard output should be skipped if the undocumented flag 0x400 is set. STARTF_USEHOTKEY means the standard input handle is a hotkey value, which the system will send in a WM_SETHOTKEY message to the application's first top-level window. The undocumented flag 0x400 means that the standard output handle references the default monitor to use. This is a UI object handle, which is global in a session. For some reason the API documents the overloaded use of standard output, but not the associated flag (0x400). Anyway, these values don't refer to kernel handles, so trying to duplicate them might succeed or fail at random depending on the values in the process handle table at the time.

vsajip commented 3 years ago

Thanks for the info. I'll look at the differences between the launchers and reconcile differences as best I can.

vsajip commented 3 years ago

@eryksun Based on your input, I have the following proposed changes - do you have any comments? Thanks.

diff -r 75037a62c582 launcher.c
--- a/launcher.c    Wed Nov 17 16:39:48 2021 +0000
+++ b/launcher.c    Thu Nov 18 12:53:57 2021 +0000
@@ -443,6 +443,11 @@
     return TRUE;
 }

+/*
+ * See https://github.com/pypa/pip/issues/10444#issuecomment-971921420
+ */
+#define STARTF_UNDOC 0x400
+
 static void
 run_child(wchar_t * cmdline)
 {
@@ -453,7 +458,25 @@
     STARTUPINFOW si;
     PROCESS_INFORMATION pi;

+#if !defined(_CONSOLE)
+/*
+ * When explorer launches a Windows (GUI) application, it displays
+ * the "app starting" (the "pointer + hourglass") cursor for a number
+ * of seconds, or until the app does something UI-ish (eg, creating a
+ * window, or fetching a message).  As this launcher doesn't do this
+ * directly, that cursor remains even after the child process does these
+ * things.  We avoid that by doing a simple post+get message.
+ * See http://bugs.python.org/issue17290 and
+ * https://bitbucket.org/vinay.sajip/pylauncher/issue/20/busy-cursor-for-a-long-time-when-running
+ */
+    MSG msg;
+
+    PostMessage(0, 0, 0, 0);
+    GetMessage(&msg, 0, 0, 0);
+#endif
+
     job = CreateJobObject(NULL, NULL);
+    assert(job != NULL, "Job creation failed");
     ok = QueryInformationJobObject(job, JobObjectExtendedLimitInformation,
                                   &info, sizeof(info), &rc);
     assert(ok && (rc == sizeof(info)), "Job information querying failed");
@@ -463,15 +486,23 @@
                                  sizeof(info));
     assert(ok, "Job information setting failed");
     memset(&si, 0, sizeof(si));
-    si.cb = sizeof(si);
-    ok = safe_duplicate_handle(GetStdHandle(STD_INPUT_HANDLE), &si.hStdInput);
-    assert(ok, "stdin duplication failed");
-    ok = safe_duplicate_handle(GetStdHandle(STD_OUTPUT_HANDLE), &si.hStdOutput);
-    assert(ok, "stdout duplication failed");
+    GetStartupInfoW(&si);
+/*
+ * See https://github.com/pypa/pip/issues/10444#issuecomment-971921420
+ */
+    if ((si.dwFlags & STARTF_USEHOTKEY) == 0) {
+        ok = safe_duplicate_handle(GetStdHandle(STD_INPUT_HANDLE), &si.hStdInput);
+        assert(ok, "stdin duplication failed");
+    }
+    if ((si.dwFlags & STARTF_UNDOC) == 0) {
+        ok = safe_duplicate_handle(GetStdHandle(STD_OUTPUT_HANDLE), &si.hStdOutput);
+        assert(ok, "stdout duplication failed");
+    }
     ok = safe_duplicate_handle(GetStdHandle(STD_ERROR_HANDLE), &si.hStdError);
     assert(ok, "stderr duplication failed");
-    si.dwFlags = STARTF_USESTDHANDLES;
-    SetConsoleCtrlHandler((PHANDLER_ROUTINE) control_key_handler, TRUE);
+    si.dwFlags |= STARTF_USESTDHANDLES;
+    ok = SetConsoleCtrlHandler((PHANDLER_ROUTINE) control_key_handler, TRUE);
+    assert(ok, "control handler setting failed");
     ok = CreateProcessW(NULL, cmdline, NULL, NULL, TRUE, 0, NULL, NULL, &si, &pi);
     if (!ok) {
         // Failed to create process. See if we can find out why.
@@ -484,7 +515,7 @@
     pid = pi.dwProcessId;
     AssignProcessToJobObject(job, pi.hProcess);
     CloseHandle(pi.hThread);
-    WaitForSingleObject(pi.hProcess, INFINITE);
+    WaitForSingleObjectEx(pi.hProcess, INFINITE, FALSE);
     ok = GetExitCodeProcess(pi.hProcess, &rc);
     assert(ok, "Failed to get exit code of process");
     ExitProcess(rc);

eryksun commented 3 years ago

@vsajip, actually it should skip duplicating all of the handles if either STARTF_USEHOTKEY or STARTF_UNDOC is set. STARTF_USESTDHANDLES shouldn't be set in these cases.

I have some added concerns about restricting handle inheritance and closing standard handles in the launcher. Most users probably want general handle inheritance, but it can also be a problem when both the launcher and the child have a handle to an object (e.g. a pipe). Before CreateProcessW() is called, the launcher should try to close its standard handles after duplicating them. If standard error is needed, ensure that the launcher's handle is non-inheritable via SetHandleInformation(handle, HANDLE_FLAG_INHERIT, 0).

eryksun commented 3 years ago

@vsajip, the following comments are unrelated to this issue, but related to the posted code.

I'd prefer to move the PostMessage() / GetMessage() calls, which clear the "app starting" state, down to the end after the process is created. Also, it should proxy the child's input idle state by calling WaitForInputIdle(pi.hProcess, INFINITE). After the wait returns, signal the launcher's input idle event. I'm not enough of a GUI expert to know the minimal operation for the latter, but my testing shows that a simple post and get from the message queue isn't enough. I think the thread has to actually create a window, but maybe someone else knows a simpler way. For example:

#if !defined(_CONSOLE)
    MSG msg;
    // End the launcher's "app starting" cursor state.
    PostMessageW(0, 0, 0, 0);
    GetMessageW(&msg, 0, 0, 0);
    // Proxy the child's input idle event.
    WaitForInputIdle(pi.hProcess, INFINITE);
    // Signal the process input idle event by creating a window and pumping
    // sent messages. The window class isn't important, so just use the
    // system "STATIC" class.
    HWND hwnd = CreateWindowExW(0, L"STATIC", L"PyLauncher", 0, 0, 0, 0, 0, 
                    HWND_MESSAGE, NULL, NULL, NULL);
    // Process all sent messages and signal input idle.
    PeekMessageW(&msg, hwnd, 0, 0, 0);
    DestroyWindow(hwnd);
#endif

What's going on with the global pid value?

static BOOL
control_key_handler(DWORD type)
{
    if ((type == CTRL_C_EVENT) && pid)
        GenerateConsoleCtrlEvent(pid, 0);
    return TRUE;
}

The call signature is GenerateConsoleCtrlEvent(dwCtrlEvent, dwProcessGroupId), so the launcher has the wrong argument order. Besides, pid is not a process group ID (i.e. it's not created with CREATE_NEW_PROCESS_GROUP). Using a PID that's not also a PGID leads to buggy behavior in the console. At its most benign, if the process is attached to the console, it acts like sending the event to group 0 (i.e. all processes in the console session). The console's internal state will be corrupted however, if the event is sent after the child process has terminated or detached, or if it was never attached to the console in the first place. With the console state corrupted, no control event is sent to any pre-existing process in the console session, which includes the user pressing Ctrl+C, Ctrl+Break, or even closing the window (i.e. CTRL_CLOSE_EVENT).

As is, this call with pid as the event to send is failing as an invalid parameter. Thankfully that's the case, else with the buggy behavior discussed above, it would endlessly create new control events (each with a new thread) in the launcher.

If a parent process wants to send its child a CTRL_C_EVENT or CTRL_BREAK_EVENT, it has to either spawn the child as a new process group or send the event to process group 0 (i.e. all processes in the console session) and ignore the event for itself. Unfortunately CTRL_C_EVENT is initially disabled in a new process group. The group leader can manually enable Ctrl+C via SetConsoleCtrlHandler(NULL, FALSE), which will be inherited by child processes. Or individual processes in the group can enable it. (Maybe the parent could try to remotely enable Ctrl+C in the process. Start the group leader suspended. Inject a thread that loads a DLL that enables Ctrl+C and then unloads itself. Finally, resume pi.hThread.)

I'd recommend that the launcher enable Ctrl+C without reservation, but there's no way to know what the parent process wants. That said, explicitly disabling Ctrl+C in a child isn't common, nor, from what I've seen, is it a generally desired side effect of CREATE_NEW_PROCESS_GROUP. It may be less annoying in the grand scheme of things to always enable Ctrl+C in the launcher, which will be inherited by the child.

The other console control events (i.e. CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, and CTRL_SHUTDOWN_EVENT) cannot be ignored. Returning TRUE (i.e. handled) makes the Windows session server (csrss.exe) immediately terminate the process, and returning FALSE (i.e. not handled) chains to the default handler that calls ExitProcess(). For the launcher this means the child process will also be terminated due to the kill-on-close job object. In case the child handles the event and has cleanup work to do, the launcher should wait a reasonable duration before returning TRUE. The session server waits on the control thread with one of two timeout settings, "HungAppTimeout" for the window close event and "WaitToKillAppTimeout" for the logoff and shutdown events. Both default to 5000 milliseconds. Query the current "HungAppTimeout" value in milliseconds via SystemParametersInfoW(SPI_GETHUNGAPPTIMEOUT, 0, &timeout, 0). For "WaitToKillAppTimeout", use SPI_GETWAITTOKILLTIMEOUT. Since this call requires loading "user32.dll", which converts the process to a GUI process, the launcher will actually never be sent the logoff and shutdown events. (A GUI process is expected to watch for the WM_QUERYENDSESSION and WM_ENDSESSION window messages.) Not getting a console control event is not a problem for the launcher, however.

jeremyd2019 commented 3 years ago

I noticed the pid-to-GenerateConsoleCtrlEvent issue, that existed in the setuptools launcher too. I don't really see why it is trying to deal with that anyway. If the launcher and python are console, they'd be on the same console, so they'd both get the CtrlEvent. If they're GUI, they aren't on a console so wouldn't get a CtrlEvent (they'd be expected to get WM_ messages instead). It kind of feels like it was done by somebody thinking about UNIX signals and trying to forward SIGINT to the child.

vsajip commented 3 years ago

It kind of feels like it was done by somebody thinking about UNIX signals and trying to forward SIGINT to the child.

Sort of - not specifically with regard to UNIX signals - the intention would have been to make sure the child responsed to Ctrl-C - but probably muffed due to misunderstanding of how the APIs are supposed to work. The fact that it was left in for GUI apps is probably just an oversight (it's there from the original commit, and my memory is a bit hazy).

actually it should skip duplicating all of the handles if either STARTF_USEHOTKEY or STARTF_UNDOC is set. STARTF_USESTDHANDLES shouldn't be set in these cases.

I thought that might be the case, but wasn't sure.

Regarding the Ctrl-C handling, is the following (my understanding of how things should be, keeping things as simple as possible) correct?

There's no need to call SetConsoleCtrlHandler for the GUI case.
In the console case, the launcher can either:
1. Call SetConsoleCtrlHandler(NULL, FALSE) to ensure Ctrl-C isn't ignored, to be inherited by the child, and don't create the child with a new process group, because that would disable Ctrl-C being sent to the child. In this case hitting Ctrl-C would send an event to both launcher and child, causing both to terminate, possibly without cleanup, though the child could implement its own Ctrl-C handling if desired to do cleanup.
2. Call SetConsoleCtrolHandler(control_key_handler, TRUE) which would install a Ctrl-C handler for the launcher, and create the child with CREATE_NEW_PROCESS_GROUP, which would make the child pid a process group pid sharing the launcher's console. But the child would start with Ctrl-C handling disabled. In control_key_handler, the launcher would have to call GenerateConsoleCtrlEvent(). But the documentation for this says both
  
  Sends a specified signal to a console process group that shares the console associated with the calling process.
  
  and
  
  If dwProcessGroupId is nonzero, this function will succeed, but the CTRL+C signal will not be received by processes within the specified process group.
  
  (emphasis on not)
  
  If this parameter [dwProcessGroupId] is zero, the signal is generated in all processes that share the console of the calling process.
  
  So what is the point of creating a new process group then? This reads as if you don't need to create a new process group, but just call (GenerateConsoleCtrlEvent(CTRL_C_EVENT , 0) to send Ctrl-C to all processes sharing the console, which includes launcher and child. But this will have been called from the Ctrl-C handler in the launcher - does that mean another Ctrl-C is sent to the launcher? That doesn't seem right. In addition, the handler then has to wait some indeterminate time before returning TRUE to the caller (indeterminate as the launcher doesn't know how long the child needs for cleanup).
Doesn't the first of these options make more sense? Will it not provide the desired behaviour - child has Ctrl-C enabled, child can install its own Ctrl-C handler for cleanup, and launcher stays simple?

vsajip commented 3 years ago

I implemented as a test, the following for the console case only:

#define ARBITRARY_DELAY_FOR_CHILD 5000

static BOOL
control_key_handler(DWORD type)
{
    if (type == CTRL_C_EVENT) {
        GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0);
    }
    /*
     * See https://github.com/pypa/pip/issues/10444#issuecomment-973408601
     */
    Sleep(ARBITRARY_DELAY_FOR_CHILD);
    return TRUE;
}

/* ... in run_child */
SetConsoleCtrlHandler((PHANDLER_ROUTINE) control_key_handler, TRUE);

and then a simple script to encapsulate in the launcher:

import sys
import time
print(sys.version)
print(sys.argv)
print(sys.executable)
print('\nPress Ctrl-C to exit:')
try:
    while True:
        pass
except KeyboardInterrupt:
    print('Ctrl-C seen, cleaning up (should take 3 secs) ...')
    time.sleep(3)
    print('Cleanup done.')

which, when run, produces

3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
['C:\\Users\\Vinay\\Projects\\simple_launcher\\test\\test.exe']
c:\python38\python.exe

Press Ctrl-C to exit:
Ctrl-C seen, cleaning up (should take 3 secs) ...
Cleanup done.

Having a hard-coded delay of 5 seconds of course won't cover some cases, but should be good enough for the vast majority - it's the default timeout set by Windows, anyway, as per the comment by @eryksun.

vsajip commented 3 years ago

OTOH, if I use SetConsoleCtrlHandler(NULL, FALSE);, then I get this:

3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
['C:\\Users\\Vinay\\Projects\\simple_launcher\\test\\test.exe']
c:\python38\python.exe

Press Ctrl-C to exit:
Ctrl-C seen, cleaning up (should take 3 secs) ...
^C
C:\Users\Vinay\Projects\simple_launcher> Cleanup done.

So with no special Ctrl-C handling code in the launcher, it looks like the child still gets time to do its cleanup. Even when I increase the cleanup time to 10 seconds, when Ctrl-C is hit the program seems to terminate (^C is printed on the screen), but after 10 seconds Cleanup done. is still printed on the console, so the child apparently wasn't terminated.

jeremyd2019 commented 3 years ago

Can you wait on the child process handle in the handler (probably with a timeout), or is that a no-no?

vsajip commented 3 years ago

If the child does indeed appear to complete, independent of the launcher, what would be the need to do that? I need to do some more investigation, but the launcher had apparently terminated (I could type other commands at the prompt), and yet the "Cleanup done." appeared after the expected delay, indicating that the child had apparently not been prematurely terminated. This is on Windows 7, mind. I haven't yet tested on other Windows versions yet.

vsajip commented 3 years ago

Updated test script:

import sys
import time
print(sys.version)
print(sys.argv)
print(sys.executable)
print('\nPress Ctrl-C to exit:')
DELAY = 10
try:
    while True:
        pass
except KeyboardInterrupt:
    print('Ctrl-C seen, cleaning up (should take %d secs) ...' % DELAY)
    for i in range(DELAY):
        print('%d steps to go ...' % (DELAY - i))
        time.sleep(1)
    print('Cleanup done.')

Screencap of it running, showing how the launcher seems to have exited (I could type dir launcher.c at the prompt) whereas the child carries on running to completion in the background, still printing output to the console:

https://user-images.githubusercontent.com/130553/142778715-a36610ea-11d0-4671-b081-83b396ca5ee7.mp4

vsajip commented 3 years ago

Wait, this behaviour may be related to my use of Cmder as a shell. I'll need to play with it using Powershell/cmd.exe.

vsajip commented 3 years ago

Cmder was behaving differently - with cmd.exe and Powershell, the child gets killed. I changed it around to enable the handler and to wait for the child as @jeremyd2019 suggested (with an infinite timeout, just to see) and it does seem to wait for the child to complete and then exits - under both Powershell & cmd.exe. I can of course set an actual timeout of 5 seconds, say.

eryksun commented 3 years ago

1. There's no need to call `SetConsoleCtrlHandler` for the GUI case.

The GUI launcher should also have a console control handler. It's useful when running a background application. Also, even if an application has a GUI, its console control handler will be called when it's running in session 0 as a batch task.

The session server sends CTRL_C_EVENT, CTRL_BREAK_EVENT, and CTRL_CLOSE_EVENT on behalf of a console host. But the server itself is the source of CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT. They get sent in the second phase of ExitWindowsEx(), after the top-level windows in an interactive session have all voted to allow the logoff/shutdown via their response to WM_QUERYENDSESSION. In an interactive session (i.e. not session 0), the server sends these two events to every process that isn't connected to a window station (i.e. the process hasn't loaded "user32.dll"). In session 0, the server sends CTRL_SHUTDOWN_EVENT to every process.

The server waits for up to "HungAppTimeout" (default 5000 ms) for CTRL_CLOSE_EVENT and "WaitToKillTimeout" (default 5000 ms) for CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT. After the wait times out, the process is forcefully terminated. For session 0, "WaitToKillServiceTimeout" (default 5000) is used instead of "WaitToKillTimeout". Of course, CTRL_LOGOFF_EVENT is never sent in session 0. Note that a batch task in session 0 usually won't get anywhere near "WaitToKillServiceTimeout" before shutdown because it gets killed by the task scheduler service. The only way out of that is for the task to respawn itself after starting. Then it won't be killed by the task scheduler service during shutdown.

(By my tests, it seems that the server DLL that handles console control events has a bug in Windows 10/11. It uses the "HungAppTimeout" and "WaitToKillTimeout" values from the ".DEFAULT" user profile instead of the current user's values. The server runs as SYSTEM, which uses the ".DEFAULT" profile, so it looks like either the code is neglecting to impersonate the client, or it mistakenly uses HKCU while impersonating. The current user values are returned when querying SPI_GETHUNGAPPTIMEOUT and SPI_GETWAITTOKILLTIMEOUT, so one would have to directly read the values from the ".DEFAULT" profile to know what timeout the server will actually use.)

Call SetConsoleCtrlHandler(NULL, FALSE) to ensure Ctrl-C isn't ignored, to be inherited by the child, and don't create the child with a new process group, because that would disable Ctrl-C being sent to the child. In this case hitting Ctrl-C would send an event to both launcher and child, causing both to terminate, possibly without cleanup, though the child could implement its own Ctrl-C handling if desired to do cleanup.

Why would you not set a control handler for the launcher in this case? SetConsoleCtrlHandler(NULL, TRUE) sets the CONSOLE_IGNORE_CTRL_C flag in the process parameters, which locally disables the Ctrl+C event. SetConsoleCtrlHandler(NULL, FALSE) clears the flag to enable the Ctrl+C event. Setting or clearing this flag has nothing to do with setting control handler functions. Setting it just means that CtrlRoutine() won't call the registered handler functions for CTRL_C_EVENT.

If dwProcessGroupId is nonzero, this function will succeed, but the CTRL+C signal will not be received by processes within the specified process group.

The sentence quoted above is misleading. It's even contradicted in the remarks section in the documentation of GenerateConsoleCtrlEvent(). Initially Ctrl+C will be ignored. But it can be toggled at will via SetConsoleCtrlHandler(NULL, ...). Unfortunately, this requires foreknowledge that the process will manually enable Ctrl+C -- or trickery (e.g. injecting a DLL). CTRL_BREAK_EVENT is never ignored, but unfortunately many console applications just let the default control handler get called, which calls ExitProcess().

So what is the point of creating a new process group then? This reads as if you don't need to create a new process group, but just call (GenerateConsoleCtrlEvent(CTRL_C_EVENT , 0) to send Ctrl-C to all processes sharing the console

Sending the event to all processes could kill your own parent process in the console session, which maybe has your process in a kill-on-close job object, and so on and so forth, up and down the process tree. It's cleaner to target the event at a process group of your own creation.

pypa / pip