skeeto / w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
The Unlicense
3k stars 211 forks source link

Aliases sometimes wait for an inordinate amount of time after exit #82

Closed grable0 closed 1 year ago

grable0 commented 1 year ago

This mostly happens in ConEmu but does happen in busybox sh as well as cmd.exe, sometimes even executables outside wait after exit too when called from within busybox sh (like msys64 man.exe). Its especially noticeable when calling aliases from other aliases or even just doing 'system("ls")'.

I solved this by doing ExitProcess() in alias.c and recompiling all the aliases. Since no matter what i do anything compiled in tiny mode using mainCRTStartup using just return exhibits the same problem. Can one really return safely from mainCRTStartup ?

skeeto commented 1 year ago

Hmm, which Windows is this (10, 11, etc.) and is it 32-bit or 64-bit? A delay on exit is typical of a memory access violation, which for console applications is usually silent. That would happen if there was nowhere to return, or a trashed return pointer. Could you run the alias under GDB to see if catches a crash on exit? If so, it would be also great if you could reproduce on a debug build (compile with -g3) to understand why. If no crash, at least it may show where the delay occurs. Perhaps there's a security program interfering (e.g. injecting a thread)?

I have never seen official documentation for the mainCRTStartup prototype, except that it's supposed to be stdcall. But Microsoft doesn't stdcall in their own definition (see exe_min.cpp in Visual Studio 2022). It's usually written with no arguments (e.g. in Mingw-w64 and older Visual Studio):

int mainCRTStartup(void);

Other times it accepts a pointer to the PEB (Visual Studio 2022):

int mainCRTStartup(void *);

That would matter for stdcall, except Windows seems to arrange such that it won't, i.e. the caller does not use the stack. The Visual Studio source potentially returns the application exit status, so it seems they're not afraid of returning. Indeed, it works fine in a quick, minimal test:

$ echo 'mov $123,%eax;ret' | cc -xassembler -nostartfiles - $ ./a ; echo $?

Reports the "123" exit status at least as far back as Windows XP, and on both 32-bit and 64-bit. Give ret a small immediate for a stdcall return and it still works fine.

Returning from the main thread does not immediately exit the process, and it will wait until all other threads have exited. Calling ExitProcess() is a good way to avoid this behavior. Though, of course, the alias program doesn't start threads.

I'm fine with calling ExitProcess for a "cleaner" exit (08ff830), but I'd like to better understand the nature of what's happening here rather than guess. Thank you for reporting this, grable!

grable0 commented 1 year ago

Im running on Windows 10 x64 1511, a rather old version.

Running your assembler example in w64devkit.exe alone works fine. Running the example in w64devkit.exe inside ConEmu waits for a while then prints 0.

Heres a test case i used to test further:

//   $ gcc -Os -fno-asynchronous-unwind-tables -s -nostartfiles -Wl,--gc-sections -o test_crtstart1.exe test_crtstart.c
//   $ gcc -DEXITPROCESS -Os -fno-asynchronous-unwind-tables -s -nostartfiles -Wl,--gc-sections -o test_crtstart2.exe test_crtstart.c

void *GetStdHandle(int) __attribute((dllimport,stdcall));
int WriteFile(void *, const void *, int, int *, void *) __attribute((dllimport,stdcall));
int ExitProcess(int) __attribute((dllimport,stdcall));

#if __i386
__attribute((force_align_arg_pointer))
#endif
__attribute((externally_visible))
int mainCRTStartup(void)
{
    const char msg[] = "this is a test\n";
    int dummy;
    WriteFile( GetStdHandle(-11), msg, sizeof(msg) - 1, &dummy, 0);
#ifdef EXITPROCESS // test_crtstart2
    ExitProcess(123);
#else
    return 123; // test_crtstart1
#endif
}

Running in w64devkit.exe alone:

$ ./test_crtstart1 ; echo $?
this is a test
123
$ ./test_crtstart2 ; echo $?
this is a test
123
$ gdb ./test_crtstart1.exe
Reading symbols from ./test_crtstart1.exe...
(No debugging symbols found in ./test_crtstart1.exe)
(gdb) run
Starting program: C:\Projects\C\tinylib\test_crtstart1.exe
this is a test
[Inferior 1 (process 4792) exited with code 0173]
(gdb) exit
$ gdb ./test_crtstart2.exe
Reading symbols from ./test_crtstart2.exe...
(No debugging symbols found in ./test_crtstart2.exe)
(gdb) run
Starting program: C:\Projects\C\tinylib\test_crtstart2.exe
this is a test
[Inferior 1 (process 8052) exited with code 0173]
(gdb) exit

Running w64devkit.exe inside ConEmu:

$ ./test_crtstart1.exe ; echo $?
this is a test
<.... waiting for a while ...>
0
$ ./test_crtstart2.exe ; echo $?
this is a test
123
$ gdb ./test_crtstart1.exe
<.... waiting for a while ...>
Reading symbols from ./test_crtstart1.exe...
(No debugging symbols found in ./test_crtstart1.exe)
(gdb) run
Starting program: C:\Projects\C\tinylib\test_crtstart1.exe
[New Thread 7740.0x1d98]
[New Thread 7740.0x1c50]
this is a test
[Thread 7740.0xf18 exited with code 123]
<.... waiting for a while ...>
[Thread 7740.0x1d98 exited with code 0]
[Inferior 1 (process 7740) exited normally]
(gdb) exit
$ gdb ./test_crtstart2.exe
<.... waiting for a while ...>
Reading symbols from ./test_crtstart2.exe...
(No debugging symbols found in ./test_crtstart2.exe)
(gdb) run
Starting program: C:\Projects\C\tinylib\test_crtstart2.exe
[New Thread 3940.0x70]
[New Thread 3940.0x7e4]
this is a test
[Thread 3940.0x70 exited with code 123]
[Thread 3940.0x7e4 exited with code 123]
[Inferior 1 (process 3940) exited with code 0173]
(gdb) exit

This does look more like a ConEmu issue than anything else really, that extra spawned thread is probably what is injected by ConEmu for output capture and other things and that is where the waiting seems to happen. So your initial assumption was right ;) Im also having trouble reproducing the waiting without ConEmu so i might be misremembering where ive seen it... It is weird though that gdb waits at startup too when run inside ConEmu.

grable0 commented 1 year ago

After further testing, i cant reproduce any waiting outside ConEmu, unless i call _sprintfor _snprintf which automatically links the executable to msvcrt.dll. gdb even waits at the same spot, right after the main thread exits...

So it seems msvcrt.dll also does something when attached to a process compiled in "tiny mode" and thus an ExitProcess is required for a clean exit. Im guessing this is what skewed my earlier findings as i use snprintf quite often even in "tiny mode" programs.

skeeto commented 1 year ago

Thanks for all this detail! That's been very helpful. The "New Thread" log is the smoking gun, enough to figure out almost everything. When I run my assembler example under ConEmu with GDB, I also see three extra threads, though I do not observe a delayed exit nor a zero status. When I link msvcrt.dll (outside ConEmu) I also see one extra thread, which is normal.

In all cases threads are blocked on ZwWaitForWorkViaWorkerFactory any time I look at them. Perhaps your particular, local system configuration delays threads blocked on that function? Or maybe they're blocked on something else? (If you'd like to check, use "info threads" in GDB.) I was unable to reproduce the delay, only these thread spawns.

As for ConEmu, I learned that it explicitly hooks every process in its tree. Even without the hook it injects a thread via CreateRemoteThread. By my measurements, this triples process startup time. That sounds worse than it really is, and the difference is just barely perceptible.

https://conemu.github.io/en/ConEmuHk.html

I went down the rabbit hole reading the ConEmu sources and investigating. It obfuscates its CreateRemoteThread lookup to evade security scanners (InjectRemote.cpp), and embeds a kind of shell code (Infiltrate.cpp) for doing its work. Fascinating stuff.

If your situation is common, perhaps it's gone unnoticed because it's unusual for programs to return on their main thread. Every runtime I've seen makes a non-returning exit system call instead of returning. I always do it in Windows Subsystem applications because interacting with windowing implicitly spawns of threads that don't return. Considering environments that inject threads, I suppose I should always ExitProcess or similar even in trivial console applications. I'll do so from now on.

I did all testing on Windows 10 22H2 with ConEmu 230724 (newest stable). If you can reproduce the assembly example on the latest ConEmu, consider reporting there. At only two instructions, it's the most trivial Windows program possible, and does not involve runtimes nor compiler shenanigans.

grable0 commented 1 year ago

Thanks, did not expect you to trawl through ConEmu, it is quite the rabbit hole indeed.

Just for completeness i ran this to get various timeouts:

#include <stdio.h>
#include <windows.h>

int main() {
    int value;
    #define PRINT(x)    if(SystemParametersInfoA( SPI_GET ## x, 0, &value, 0) == 0) puts( #x " = unknown\n");       \
                else printf( #x " = %d\n", value)
    PRINT(ACTIVEWNDTRKTIMEOUT);
    PRINT(FOREGROUNDLOCKTIMEOUT);
    PRINT(HUNGAPPTIMEOUT);
    PRINT(WAITTOKILLTIMEOUT);
    PRINT(WAITTOKILLSERVICETIMEOUT);
    return 0;
}

My output:

ACTIVEWNDTRKTIMEOUT = 500
FOREGROUNDLOCKTIMEOUT = 25363236
HUNGAPPTIMEOUT = 5000
WAITTOKILLTIMEOUT = 5000
WAITTOKILLSERVICETIMEOUT = 5000

Nothing looks out of the ordinary, so il close this.