itsmattkc / dotnet9x

Backport of .NET 2.0 - 3.5 to Windows 9x
2.31k stars 149 forks source link

Implement `InterlockedCompareExchange` using CAS #21

Open friendlyanon opened 7 months ago

friendlyanon commented 7 months ago

This implementation is only usable on 486 or newer CPUs, so there is also code that detects at runtime whether the AC bit of the EFLAGS register can be set. On 386, this register always returns the same value.

The CAS implementation uses the lock cmpxchg instruction, which is only executed if the above runtime check is satisfied. The overhead of calling this implementation is a cmp followed by a jnz instruction, which should have a negligible cost when execution already had to cross a DLL boundary to get here.

Using the command

find -name "*.dll" -o -name "*.vxd" | perl -nE "print if grep { $_ =~ /InterlockedCompareExchange/ } `dumpbin /imports $_`;"

Here is a list of .dll and .vxd files in the installer that import the "InterlockedCompareExchange" symbol and thus benefit from this:

List of .dll files ``` $WINDIR/Microsoft.NET/Framework/sbscmp10.dll $WINDIR/Microsoft.NET/Framework/sbscmp20_mscorwks.dll $WINDIR/Microsoft.NET/Framework/sbscmp20_perfcounter.dll $WINDIR/Microsoft.NET/Framework/SharedReg12.dll $WINDIR/Microsoft.NET/Framework/v1.0.3705/mscormmc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/AdoNetDiag.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/alink.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_filter.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_isapi.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/Aspnet_perf.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/CORPerfMonExt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/cscomp.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/Culture.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/dfdll.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/diasymreader.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/fusion.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/MmcAspExt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordacwks.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbi.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorie.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorjit.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorld.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorpe.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsec.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsn.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsvc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscortim.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorwks.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/normalization.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/PerfCounter.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/peverify.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/sbscmp20_mscorlib.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/shfusion.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/ShFusRes.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.OracleClient.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.EnterpriseServices.Wrapper.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Transactions.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/VsaVb7rt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/webengine.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/WMINet_Utils.dll $WINDIR/RegisteredPackages/{D5D40355-5FB0-48fb-A231-CDC637FA16E0}/NETFXMigration.dll $WINDIR/System/dfshim.dll $WINDIR/System/mscoree.dll $WINDIR/System/mscories.dll $WINDIR/System/msvcm80.dll $WINDIR/System/msvcp80.dll $WINDIR/System/WBEM/Wmidcad.dll $WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcm80.dll $WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcp80.dll $WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcm80.dll $WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcp80.dll ```

Fixes #19

friendlyanon commented 7 months ago

I have also cleaned the CMakeLists.txt up a little. I don't know what toolchain you used to build the project, but now with an extra .lib file for the missing exports (EnumSystemLocalesA EnumSystemLocalesW GetThreadLocale SetThreadLocale from kernel32) the project also builds with the VS2022 toolchain and a modern Windows SDK. I am doing something similar in simcity-noinstall, which technically also targets Windows 95.

friendlyanon commented 7 months ago

I just noticed that you are using the MSVC420 (nice) toolchain, which is missing MASM. You would need to update CI to also grab MASM6.

friendlyanon commented 7 months ago

Alright, I have installed Win95 in a VM, then installed MASM 6.11 on it, then upgraded that to 6.11d then 6.14. I extracted the installation folder from the .vdi file and with the bin directory added to PATH, CMake will find ML.EXE and properly compile the assembly files when configuring with -D CMAKE_ASM_MASM_FLAGS=/coff. Full build commands with MSVC420 and MASM614:

:: call C:\MSVC420\bin\VCVARS32.BAT x86
:: set "PATH=%PATH%;C:\MASM611\BIN"
C:\cmake-3.13.5-win64-x64\bin\cmake.exe -B build -G "NMake Makefiles" -D CMAKE_BUILD_TYPE=Release -D CMAKE_ASM_MASM_FLAGS=/coff
C:\cmake-3.13.5-win64-x64\bin\cmake.exe --build build

And with a modern toolchain:

:: call "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\VsDevCmd.bat" -arch=x86 -host_arch=amd64 -no_logo
cmake -B build-modern -G Ninja -D CMAKE_BUILD_TYPE=Release -D "CMAKE_ASM_MASM_FLAGS=/nologo /quiet" -D "CMAKE_C_FLAGS_RELEASE=/O1 /Ob1 /DNDEBUG=1 /Gy /Gw /GL /arch:IA32 /GS- /Gs1000000000" -D "CMAKE_MODULE_LINKER_FLAGS_RELEASE=/INCREMENTAL:NO /Brepro /emitpogophaseinfo /opt:icf /opt:ref /nocoffgrpinfo /manifest:no /emitvolatilemetadata:no /LTCG"
cmake --build build-modern
andresvettori commented 7 months ago

Would be possible to implement exactly this code but using XCHG for supporting 386 / x86 processors? XCHG would work for all cpus (slower than CMPXCHG, of course). If we want to optimize we can compile for 386/486 processors, or conditionally run either version at runtime. What do you think?

friendlyanon commented 7 months ago

That defeats the purpose. I don't see how an atomic unconditional exchange without a return value would help implementing an atomic conditional exchange with a return value. For a 386 target, the existing solution is "good enough", since it wouldn't be really atomic even with XCHG. There is a reason why Microsoft had a really heavyweight solution to this.

WamWooWam commented 6 months ago

Hi there! Been thinking about this and I get the impression it probably isn't worth the effort. It appears the .NET JIT itself will happily emit lock cmpxchg instructions when using APIs like System.Threading.Interlocked.CompareExchange, as the .NET 2.x JIT appears to only ever have targeted i586-class (Intel Pentium) CPUs and later.

I think a better option is to correctly implement InterlockedCompareExchange using lock cmpxchg and inline assembly (to avoid bringing in MASM) and to only support i486-class CPUs and above for this project, as patching all uses of cpmxchg out of the .NET JIT would likely be a massive undertaking that could only ever emit incorrect code, and would disproportionately cause threading/race condition issues on other platforms like NT4 and Windows 98, where these features are supported by minimum system requirements.

friendlyanon commented 6 months ago

Sounds good to me. I originally intended to put this PR up as a draft, mainly to gather feedback from Matt, but he hasn't commented on this topic yet. Explicitly excluding 386 would be a kind of compromise I wouldn't want to force, even though the JIT apparently does not care.

itsmattkc commented 6 months ago

Ahh sorry for not commenting, I was following the discussion but forgot to weigh in myself. I think the x86 asm solution is great (though inlined asm probably is preferable if that's doable).

As for 386s, that's a tough one. We can probably just fallback to the current implementation for the time being, but I'm currently of the opinion that this project should start angling towards becoming a more general kernel extender since there are limitations to what we can do (even with .NET apps) only patching the runtime. In that scenario, perhaps there would be a way to bring 98's driver-based solution over for 386?

friendlyanon commented 6 months ago

perhaps there would be a way to bring 98's driver-based solution over for 386?

Sure, but that's outside my current abilities or the scope of this PR.

friendlyanon commented 6 months ago

Actually, inline asm is causing values to be left on the stack:

Ghidra ```asm ************************************************************** * FUNCTION * ************************************************************** LONG __stdcall InterlockedCompareExchange(LONG * Destina LONG EAX:4 LONG * Stack[0x4]:4 Destination XREF[2]: 10001383(R), 10001393(R) LONG Stack[0x8]:4 Exchange XREF[1]: 100013b0(R) LONG Stack[0xc]:4 Comperand XREF[1]: 100013a5(R) undefined4 Stack[0x0]:4 local_res0 XREF[1]: 1000137f(R) undefined4 Stack[-0x4]:4 local_4 XREF[1]: 1000137b(R) 0x1370 223 InterlockedCompareExchange Ordinal_223 XREF[2]: Entry Point(*), 100023a0(*) InterlockedCompareExchange 10001370 83 3d 10 CMP dword ptr [DAT_10007010],0x0 70 00 10 00 10001377 56 PUSH ESI <-- These are 10001378 57 PUSH EDI <-- never popped. 10001379 74 18 JZ LAB_10001393 1000137b 8b 4c 24 04 MOV ECX,dword ptr [ESP + local_4] 1000137f 8b 54 24 08 MOV EDX,dword ptr [ESP + local_res0] 10001383 8b 44 24 0c MOV EAX,dword ptr [ESP + Destination] 10001387 f0 0f b1 11 CMPXCHG. dword ptr [ECX],EDX 1000138b c2 0c 00 RET 0xc ```

Pretty sure this is a compiler bug. I made sure I call another function to avoid the prologue/epilogue causing issues. Might just rewrite the whole function using __declspec(naked) and omit prologue/epilogue and logging altogether.