leecher1337 / ntvdmx64

Run Microsoft Windows NTVDM (DOS) on 64bit Editions
784 stars 81 forks source link

Huge memory leak with Msys2 Cygwin when ntvdmx64 is installed #100

Open revelator opened 4 years ago

revelator commented 4 years ago

This happened before but i was unsure where the problem stemmed from, it now turns out it is due to some incompatibility with ntvdmx64 and cygwin based shells.

Example: If building a package with Msys2 memory will grow untill BSOD. I have 32 gb ram and it ends up using double that filling the page file untill everything grinds to a halt. Uninstalling ntvdmx64 and Msys2 / Cygwin works again.

leecher1337 commented 3 years ago

Just to be sure: Are you using the latest loaders from this repository?

https://github.com/leecher1337/ntvdmx64/commit/9ad02af10aaaff50ba65a3d1acfb798d0da6d526 i.e. added hotpatching support that reduces the need to allocate extra memory for inline hooks. Altough memory should be freed on application exit by the operating system, maybe memory pages didn't get freed properly?

Which Windows version are you are using? Which process eats up all the memory? conhost.exe?

I installed MSYS2 and tried to compile a few files on a Win10 VM with 4GB RAM, but didn't notice any increase in memory consumption. Thus, please provide clear step-by-step instructions to reproduce the problem.

revelator commented 3 years ago

Aye latest loaders, it is the standby list that grows untill exhaustion i cannot see which process in particular is affected with process hacker but it grows exponentially the more time spent using Msys2 bash shell. For instance if i do a batch build of several packages in Msys2 it will at some point grind to a halt, if i monitor it the standby list will start to grow newer releasing the memory allocated.

Uninstalling ntvdmx64 returns everything to normal.

revelator commented 3 years ago

Windows is win10 2004 but i also had this problem earlier as said on a win7 sp1 machine with the latest patches, so it does not seem to be windows version specific.

leecher1337 commented 3 years ago

Win 7 loader contains some other Windows-bugfixes than Win10 loader, for instance, that's why OS version information is useful.

Stock Windows without i.e. proprietary (buggy) Antivirus programs, I guess?

Is there some simple project that compiled in msys2 where behaviour can be tested? I tried compiling OpenTTD in msys2, but failed due to its dependencies and I have absolutely no idea how to satisfy them. So it would be good if you find a big project that can be compiled with msys2 where the mentioned behaviour occurs and document the building step-by-step so that I can reproduce it on my test machine by just carrying out the same commands.

I wonder how the loader can exhaust memory, as the amount that it reserves in target processes is very minimal, just a few bytes to execute the loader code or place some inline hooks. Providing that memory doesn't get freed on process termination of the target process, eating up all memory would still take millions of invocations. So I need to have a way to reproduce your scenario.

revelator commented 3 years ago

Hmm a good way is getting the mingw-packages repo and then running ci-build.sh which will recompile all the packages that should do the trick. Antivirus is windows defender but it is turned off when building (was my first guess but no cigar).

I do also have winevdm installed if that makes any difference ?.

leecher1337 commented 3 years ago

ci-library.sh cannot find "repman" command when I try that. repman is not a package that can be found by pacman. I'm confused, is there some tutorial how this all works?

winevdm isn't a problem, ntvdmx64 and winevdm should play together nicely.

leecher1337 commented 3 years ago

Aha, repman seems to be in pactoys repository..

revelator commented 3 years ago

Yep 👍

Pacman had some changes recently.

Ok so we can rule out winevdm which btw does not exhibit this problem, might it be a problem with ldntvdm.dll hooking into bash.exe ?, though bash should in no way have any 8 bit code to execute so im not sure about what happens there.

revelator commented 3 years ago

Guess ill have to go deep with a kernel debugger (if only it was that easy on win10 urgh).

revelator commented 3 years ago

Oo that was fast.

revelator commented 3 years ago

Seems to have done the trick no problems so far. 👍

revelator commented 3 years ago

One small gripe still. ntvdmx64 prevents msys/cygwin bash from exiting, i have to force kill the process.

leecher1337 commented 3 years ago

Do you know where to find .pdb sysmbols for msys-2.0.dll and get a version of the DLL that is aware of debug symbols? If I install devel packages via pcman, I only get a .dbg file which isn't usable in the debugger. The crash in mysys2 occurs here, but without debug symbols, I'm not really able to find the cause:

image

revelator commented 3 years ago

Hmm the debug dll should be enough but you probably need gdb to debug it. There is a gdb plugin for msvc directions here https://gpac.wp.imt.fr/2015/06/11/using-gdb-in-visual-studio/ this should help but is not the easiest thing to setup.

There was another non freeware plugin for visual studio named wingdb which was somewhat easier but you had to pay for it.

revelator commented 3 years ago

If you can live with a more modest debugger i have a port of the old insight gdbtk debugger frontend i can upload. It uses tcl/tk wish for the gui and while not as flashy as msvc's it can do pretty much the same eg. stack frames breakpoints even show where in the code the crash happens.

revelator commented 3 years ago

Ah wingdb has a fully functional 30 day trial here http://wingdb.com/

revelator commented 3 years ago

Also you need to rename the msys-2.0.dbg file to msys-2.0.dll and replace or move the release version out of the way before debugging. Then start bash and hook it with the debugger, when exiting the bash shell the debugger should then show where in the code the hang occurs since it does not crash here it just hangs indefinatly so i need to hook it manually. Ill also have a look myself ;-) many hands make light work.

leecher1337 commented 3 years ago

Hi, Moving the .dbg file to .dll results in an error that the .dll is not a valid executable, are you sure that the .dbg file contains usable code?

From what I would estibate, I guess that it crashes in sigproc.cc in:

          while ((qnext = q->next))
        {
          if (qnext->si.si_signo && qnext->process () <= 0)
            q = qnext;
          else
            {
              q->next = qnext->next;
              qnext->si.si_signo = 0;
            }
}

It's not related to the direct hooking of ldntvdm.dll in the mintty.exe, I think it possibly has something to do with conhost.exe hooking, because the error also occurs if I just hook conhost and not the mintty.exe, but I'm not sure what the function would be that corrupts the list entries in msys wait_sig() function

revelator commented 3 years ago

sorry mistake on my part, the dbg file is actually the file containing the debug symbols in gdb format. It is what gdb uses instead of pdb, i mistook it for a renamed dll since older cygwin also had one of those.

Hmm thats a good question, guess we need to dig even deeper to find out what is going on there.

revelator commented 3 years ago

hmm sig wait is a posix signaling function used to determine if a process is still runnning, could be that the posix layer dll (cygwin1.dll msys-2.0.dll) are waiting on ntvdmx64 to release a hooked process it relies on.

leecher1337 commented 3 years ago

But this still shouldn't lead to a memory corruption in the list it traverses, right? The ldntvdm.dll shouldn't do any harm in the target mintty.exe process, as I made a version that bails out on DLL process attach to ensure it's not loaded into mintty.exe and the corruption in there still occurs. But why should some "bad" behaviour of conhost cause an invalid memory access in msys signal handling, that's also weird... Not sure what is going on here. I saw a change in msys commit history that wanted to place the pdb instead of the .dbg file in the devel package, but this change was reverted. I also tried to convert dbg to pdb with cv2pdb which actually spit out a pdb, but then the debugger says that the .dll isn't containing any debug references to the pdb. Does msys have any tracing support to compare the run graphs with and wihout the ldntvdm loader maybe?

revelator commented 3 years ago

Not sure but i can ask them, i frequent there regularily.

Also not sure if this could be related but the posix layer dll's dont use adress space randomization (ASLR) which is also the reason why you sometimes have to rebase them (not as often as the old msys-1.0.dll though).

Do you use the 32 or 64 bit Msys2 ?. The 64 bit version uses SEH while the 32 bit version uses dwarf which might explain why the converted dbg file fails as msvc is wholly incapable of handling the dwarf format. Tbh it might be better to debug it it with the native tools anyway.

If you want another go at it try debugging it with these -> 64 bit insight gdb debugger https://sourceforge.net/projects/cbadvanced/files/Tools/insight64.7z/download 32 bit insight gdb debugger https://sourceforge.net/projects/cbadvanced/files/Tools/insight32.7z/download

Ill also ask the Msys2 devs if they have any notion of where things might go bad.

leecher1337 commented 3 years ago

Looks like this is the function where it crashes, at least from googling:

+threadlist_t __reg2 *
+init_cygheap::find_tls (_cygtls *tls)
+{
+  tls_sentry here (INFINITE);
+
+  threadlist_t *t = NULL;
+  int ix = -1;
+  while (++ix < (int) nthreads)
+    {
+      if (!threadlist[ix].thread->tid
+     || !threadlist[ix].thread->initialized)
+   ;
+      if (threadlist[ix].thread == tls)
+   {
+     t = &threadlist[ix];
+     break;
+   }
+    }
+  /* Leave with locked mutex.  The calling function is responsible for
+     unlocking the mutex. */
+  if (t)
+    WaitForSingleObject (t->mutex, INFINITE);
+  return t;
+}
+

nthreads = 6, crashes on last thread in likst (ix = 5). Still no idea why the last pointer in the threadlist is corrupt.

revelator commented 3 years ago

Might be a case for the cygwin devs.

The dll uses a variant of pthreads from newlib and though i dont think it is related to winpthreads it might suffer from some bug that recently turned up in the latter (causing thread hang, though that should possibly have been fixed now).

Looks like it uses async threads from what you dug up.

revelator commented 3 years ago

Hmm seems the memory leak is not completely gone allthough it takes several days now instead of hours ldntvdm will still slowly eat virtual memory untill it hits exhaustion. Just discovered this today after i had my pc on for several days.

The cygwin bug also still affects it though i have no explanation as to why the threading code in cygwin collides with it.

leecher1337 commented 3 years ago

As you were able to reproduce it: Can you check with process explorer which type of resources it leaks? Thread handles? File handles? I checked handles multiple times but didn't find any forgotten handles that aren't closed.

revelator commented 3 years ago

Will do might take a little before i have an some results though as it only seems to happen over a couple of days.

revelator commented 3 years ago

Mainboard gave out so i havent yet had time to test it further, one weird thing though the msys2 shell works fine on win 7 64 Oo the bug only seems to affect win10 it seems.

revelator commented 3 years ago

seems to be some weird incompatibility between the msvc based msys runtime and win10's ucrt ditto. There has also been some other weird stuff happening (compiler flags that get ignored or suddenly need flags that we did not need before) which did not happen on win7.

leecher1337 commented 3 years ago

There are also very idiotic code changes in Win32 Subsystem that cause bad behaviour, see i.e. https://github.com/leecher1337/ntvdmx64/issues/93 These are Windows bugs, but I guess nothing can be done about them :-(

revelator commented 3 years ago

Ugh... indeed that sucks :S