oneclick / rubyinstaller2

MSYS2 based RubyInstaller for Windows
https://rubyinstaller.org
BSD 3-Clause "New" or "Revised" License
645 stars 248 forks source link

unexpected ucrtbase.dll #308

Closed P3t3rU5 closed 1 year ago

P3t3rU5 commented 1 year ago

What problems are you experiencing?

unexpected ucrtbase.dll when running anything ruby

Steps to reproduce

any ruby executable execution like ruby -v, gem list, etc

What's the output from ridk version?

unexpected ucrtbase.dll

Furthermore, I'm on Windows 11 Pro Insider Preview build 25211.rs_prerelease.220923-1354 and I'm suspecting something changed in latest build

larskanis commented 1 year ago

OK, this is rather unexpected :smile:

Which RubyInstaller version did you use? Can you compare 3.0 and 3.1 versions? RubyInstaller-3.1 switched to more modern UCRT runtime, while 3.0 still uses the MSVCRT.

P3t3rU5 commented 1 year ago

Didn't even thought of trying other versions I had some previous versions installed:

s00ner commented 1 year ago

I'm on the same Windows 11 version and also recently started getting unexpected ucrtbase.dll when running anything ruby.

ArminG-MSFT commented 1 year ago

Hello this is Armin from Microsoft. We encountered the same or at least a very similar issue in our internal compatibility testing. What seems to be the issue is that Ruby is relying on undocumented/unsupported behaviour. It is trying to retrieve a pointer to an internal variable __pioinfo inside urtbase.dll. That itself is not supported, but from our debugging and looking at the code in win32.c it looks like it is doing it by walking back from the ret assembly instruction in _isatty(). But this assumes the ret instruction will be the last instruction in that _isatty().

The issue is that certain optimizations can move function code beyond a ret instruction, so the common/typical instruction path is shorter. And new optimizations we applied to recent ucrtbase.dll builds did exactly that in _isatty(). So it looks like Ruby is now looking at the wrong location and no longer finding __pioinfo.

Short term we'll be rolling out a fix that for this function that undoes this optimization, which should fix this specific error for the many Ruby users in the wild. But it would be great if Ruby can be fixed to not use undocumented behaviour, or perhaps at least be more robust in its function traversal. Note though that at some point we may break it in some way that is not so easily fixed.

The offending code is in win32\win32.c function set_pioinfo_extra.

Feel free to correct me though if our understanding is incorrect.

MSP-Greg commented 1 year ago

@ArminG-MSFT Thank you. Added your comments to https://bugs.ruby-lang.org/issues/18605#note-4

gonzaloriestra commented 1 year ago

@ArminG-MSFT thanks a lot for the detailed explanation. Could you be so kind to let us know once the fix is released?

ArminG-MSFT commented 1 year ago

Tentative will be in about two weeks. I'll provide a build number when available.

Some additional background to explain why the current Ruby behaviour is not very desirable. We at Microsoft use Profile Guided Optimizations, and hence every time we generate a new profile, we can shuffle generated assembly. Add to that shuffle due to internal toolchain behaviour. So the fact that it rarely broke before is remarkable.

We have solved it by making the _isatty entry point a jmp to the actual implementation, and below the jmp non-reachable code that satisfies the Ruby behaviour. If this passes compatibility testing, it will be flighted in about two weeks. That should make sure whatever actual code generation occurs, Ruby will find its sequence.

Of course, at some point, we'll remove this workaround as now every caller of _isatty will have some overhead to satisfy Ruby behaviour. So we should work to get Ruby transferred to something portable. That should also unblock porting ARM64/ARM64EC, so I think we'll have a joint interest here.

ArminG-MSFT commented 1 year ago

Update: The fix will be in build 25235.rs_prerelease or higher. I don't know yet when that will be out publicly.

ArminG-MSFT commented 1 year ago

Build 25236 has just been released. So all blocked can actively search for updates and will get it offered now.

olifink commented 1 year ago

@ArminG-MSFT - I just tried rubyinstaller-devkit-3.1.2-1-x64.exe on Window on Arm (Windows DevKit 2023) with Windows Pro 11 Insider Preview (dev channel) on 25236.rs_prerelease.221028-1618 and still get unexpected ucrtbase.dll trying to run ruby. Shouldn't it run emulated nontheless?

I can confirm same versions (rubyinstaller&windows insider dev) on x64 SurfaceBook works now

ArminG-MSFT commented 1 year ago

Running Ruby on Windows for ARM64 will indeed run emulated, but that has separated generated assembly code which does not contain my fix as it will load the ARM64 native ucrtbase.dll and look for the EC-export (#_isatty vs _isatty). We really need to get Ruby to use ARM64 natively, but let me see what I can do here too.

ArminG-MSFT commented 1 year ago

Update: The issue for ARM64 is of course that Ruby runs emulated, and hence scanning the actual assembly that is executed is never going to yield the expected results. (That is on top of the generic issue that ARM64 does not have a lea-instruction and hence typically will load addresses using a runtime register and the concept itself is tricky.) But when emulated and using GetProcAddress, one will get the ARM64EC forwarder instead of the actual #_isatty inside ucrtbase.dll. Hence currently Ruby is scanning that forwarder when running emulated. That is great, as that is AMD64 assembly. I managed to insert the dummy sequence there, which unblocks the startup check when Ruby runs on Windows on ARM.

I'll update again when this gets flighted.

ArminG-MSFT commented 1 year ago

Update: The patch for AMD64 emulation on ARM64 will be in 25250.1000.rs_prerelease or higher. I don't know the flight schedule yet.

ArminG-MSFT commented 1 year ago

Update: Build 25252 has just been released. If not offered automatically, actively search for updates will cause it to show.

olifink commented 1 year ago

thanks @ArminG-MSFT 25250 is now working for me on Windows on Arm Dev Kit 2023 device - thanks!

nurse commented 1 year ago

Hi, I'm a person who introduced the __pioinfo hack, and notice this thread just now. Why Ruby depends to _pioinfo is

I'm very sorry that Visual C++'s optimization is blocked by my hack. But unfortunately our decision to merge the change is not made as a tiny hack. After first I came up with the idea, we investigated other solution for a year. And also I believe many people including you struggled to find a smarter way, but no one proposed a solution.

I agree that Ruby should support ARM64 natively. I'm very welcome to contribute a patch to get __pioinfo on ARM64.

Of course I'm also welcome to propose a better solution. Thanks,

larskanis commented 1 year ago

@nurse Thank you for sharing these details! I think it's best to move the discussion to bugs.ruby-lang.org where I posted my answer. Maybe the title could be somewhat more generic there.

@ArminG-MSFT Thank you for your investigation and your short term solutions! IMHO this particular issue with Windows 11 Insider Preview is solved that way.

KJTsanaktsidis commented 1 year ago

It took me a bit of fiddling to get the compatibility hack in Windows build 25250 because I'm not really a Windows person; in the hope it will be useful to anybody else trying to build Ruby from source, the way to get this fix is:

Or.... if you don't want to do that to your gaming PC (that's me!)

If you install the SDK, and then run "x64 native tools command prompt for Visual Studio 2022", it seems that it picks up the new SDK version - e.g. you'll see it in %WindowsLibPath% env var et al.

ArminG-MSFT commented 1 year ago

It took me a bit of fiddling to get the compatibility hack in Windows build 25250 because I'm not really a Windows person; in the hope it will be useful to anybody else trying to build Ruby from source, the way to get this fix is:

Or.... if you don't want to do that to your gaming PC (that's me!)

If you install the SDK, and then run "x64 native tools command prompt for Visual Studio 2022", it seems that it picks up the new SDK version - e.g. you'll see it in %WindowsLibPath% env var et al.

Hello 'KJTsanaktsidis',

Just to clear any confusion. The fix I made is in ucrtbase.dll in the OS for both AMD64 and ARM64 (so it can run emulated x64 Ruby). There is hence no need for newer SDK's itself. The idea of the fix is to lock the assembly in place, by providing a dummy sequence for Ruby to find that satisfies its requirements. That way Ruby build with older SDK's will continue to work on newer OS even if we shuffle the assembly for better optimization again inside the OS.

Using the latest SDK will give you access to all new features, so is not a bad thing! But it is not needed to build a Ruby build compatible with Windows 11. I'm mentioning this, just to make clear the fix is in the OS itself, not SDK libraries.

The idea is hence that both regular production Windows as well as preview-Windows have the sequence Ruby needs. Are you saying that is not what you were experiencing?

KJTsanaktsidis commented 1 year ago

Yes - I ran into this issue while building ruby (by hand in x64 VS console) on what I thought was an up to date win11 2H22 system yesterday. Building against a newer canary SDK fixed it for me. I’m only concerned with running this ruby on the machine I built it on, so I haven’t really thought about distributing the build (I just want to run the test suite to check a patch I’m working on didn’t break anything on windows!)

I had assumed the patch just hadn’t made it out to windows update for me yet - are you saying it is in the out in the normal release stream?

it’s of course possible I did something else to resolve it in my flailing about. There was a while where I was accidentally doing a 32 bit build which definitely won’t work based on what you’re saying for example.

Also just as a side note, this is a Herculean effort at backward comparability here by Microsoft, and I’m in awe of it.

ArminG-MSFT commented 1 year ago

Thank you for clarifying. The 32bit builds recently broke indeed and we were already discussing next steps.

But for 64 bit, Windows 11 22H2 and before, we did not do anything to as the default code gen still remains unchanged. So no patch was needed. We do have compatibility testing in place to make sure 22H2 stays compatible. Also, I just checked its assembly and I believe it still is? But else we'll just have to make the same fix there.

The backstory is that for 25xxx builds we changed things internally that made the code gen different, setting all the above in motion. But 22xxx builds should be unaffected. If they are not for 64bit I would want to know!

KJTsanaktsidis commented 1 year ago

Ok I’m probably 90% sure I only got this error on the 32bit build on 2H22 then. Sorry for muddying the waters.

But 22xxx builds should be unaffected. If they are not for 64bit I would want to know!

I don’t have any way to verify this without reinstalling windows on my computer now that I upgraded to the Dev track right? Is there a way I can build and run Ruby against an older version of the SDK perhaps and ensure ruby runs against that version of the DLL?

I’m starting to think what I did by installing the SDK was pointless because ruby.exe will load the ucrtbase.DLL from C:\Windows\SysWOW64 at runtime anyway and ignore the DLL in the dev kit directory?

sorry for being a bit of a noob here.

ArminG-MSFT commented 1 year ago

No worries. On the 32bit issue, it seems Ruby gets lucky once in a while when it scans beyond _issatty and finds what it needs in some builds with not in others. So on some of our builds it works, while on others it doesn't depending on random shifts in unrelated assembly generation.

I'll just try and add a dummy sequence in Ruby 32bit for now too.

awhitenose commented 1 year ago

Probably this is useful. I created a patch to win32.c which is based on win32.c taken from Ruby 3.2.x but really it may be applied and was tested with Ruby 2.6.x up to 3.2.x. It fixes 'unexpected ucrtbase.dll' issue for WIn32 builds and lets run 32-bit Ruby on Win11 as well as older Windows.

Before making the patch, I have disassembled ucrtbase.dll from no-issues 32-bit windows, 64-bit, and problematic Win11 32-bit and 64-bit. Checked __pioinfo lookup with all the four by looking into asm and finding patterns. The only difference was using LEAVE in the newer 32-bit ucrtbase.dll vs POP EBP in the previous. The main MOV/LEA pattern was not changed and may be used as is. There are no changes in 64-bit asm that would prevent current patterns from working. I did not need 64-bit, but checked asm.

As a result this patch only adds the additional check for LEAVE instruction for 32-bit versions only. Win11 ucrtbase.dll _isatty is longer than 300 bytes, hence changed to 500 which helped.

--- win32.c.orig    2023-02-08 07:02:20.000000000 +0100
+++ win32.c 2023-06-24 00:58:44.000000000 +0100
@@ -2596,8 +2596,8 @@
      * * https://bugs.ruby-lang.org/issues/11118
      * * https://bugs.ruby-lang.org/issues/18605
      */
-    char *p = (char*)get_proc_address(UCRTBASE, "_isatty", NULL);
-    char *pend = p;
+    unsigned char *p = (char*)get_proc_address(UCRTBASE, "_isatty", NULL);
+    unsigned char *pend = p;
     /* _osfile(fh) & FDEV */

 # ifdef _WIN64
@@ -2622,10 +2622,16 @@
 #  define PIOINFO_MARK "\x8B\x04\x85"
 # endif
     if (p) {
-        for (pend += 10; pend < p + 300; pend++) {
+        for (pend += 10; pend < p + 500; pend++) {
             // find end of function
+#if _WIN64
             if (memcmp(pend, FUNCTION_BEFORE_RET_MARK, sizeof(FUNCTION_BEFORE_RET_MARK) - 1) == 0 &&
-                (*(pend + (sizeof(FUNCTION_BEFORE_RET_MARK) - 1) + FUNCTION_SKIP_BYTES) & FUNCTION_RET) == FUNCTION_RET) {
+                (*(pend + (sizeof(FUNCTION_BEFORE_RET_MARK) - 1) + FUNCTION_SKIP_BYTES) & FUNCTION_RET) == FUNCTION_RET) 
+#else
+           if ((*pend == 0x5d /*pop ebp*/ || *pend == 0xc9 /*leave*/) &&
+       *(pend+1) == 0xc3 /*ret*/)
+#endif     
+       {
                 // search backwards from end of function
                 for (pend -= (sizeof(PIOINFO_MARK) - 1); pend > p; pend--) {
                     if (memcmp(pend, PIOINFO_MARK, sizeof(PIOINFO_MARK) - 1) == 0) {
ArminG-MSFT commented 1 year ago

That will certainly help making it more robust indeed!

Two additional comments: 1) I also added a dummy sequence for 32bit ucrtbase.dll in the latest 258xxx builds. These are not flighting yet though. 2) When checking I noticed 32bit was on and off working on our internal builds, not so much because _isatty is longer (even though it is indeed on some builds), but because the end-sequence is changed as you noticed and it was hence overshooting _isatty when searching backwards. Depending on random build changes in other functions, it was then finding the sequence in the functions below _isatty. Just to indicate how hacky this is. :-)

So longer term we should still rewrite the code, which will also allow a native ARM64 port.

But once 258xxx flights, all three (64bit AMD64, 64bit AMD64 emulated on ARM64 as well as 32bit x86) will have a dummy sequence and should no longer be dependent on ever changing linker code generation.

awhitenose commented 1 year ago

Thank you @ArminG-MSFT for your comments. Do you mean that the recent ucrtbase.dll must not cause issues with Ruby builds and running? This is good news if so, and then it's probably me but also some of the clients are still on previous builds for any reasons and that was why I decided to create the patch myself and finally get a working Ruby for us and them.

Please note, the above patch is not applied to Ruby sources. I'm not one of Ruby developers or collaborators. However, if someone finds my fix useful, please feel free to apply it to Ruby. I think it still may be useful for someone who builds a custom Ruby from sources and finds it not working on some (yet not recent) versions of Windows.

ArminG-MSFT commented 1 year ago

AMD64 and AMD64 on ARM64 have no longer issues. 32bit on PC does still have issues depending on build (as the issue occurs on and off depending on code gen), as the ucrtbase.dll with fix is not yet flighted yet. Fixed builds are 258xxx builds, but these are not yet released. I'll report here when it does.

pbo-linaro commented 7 months ago

I implemented __pioinfo pointer detection for (native) windows-arm64 in https://github.com/ruby/ruby/pull/8995. You're welcome to join and review it.

pbo-linaro commented 7 months ago

@ArminG-MSFT If you could ensure _isatty in ucrtbase.dll for windows-arm64 is not modified too, that would be very welcome to help support this platform.

ArminG-MSFT commented 7 months ago

@ArminG-MSFT If you could ensure _isatty in ucrtbase.dll for windows-arm64 is not modified too, that would be very welcome to help support this platform.

I can add a dummy sequence, just like for AMD64, x86 and ARM64EC (x64 running on ARM64). I also think that is better as your patch assumes the adrp/add pair is before the past ret. However, that is up to the optimizer in our toolchain.

What I intend to do is make _isatty on ARM64 like this:

  00000001800C3E50: 17FF97EC  b           _isatty_proc
  00000001800C3E54: D43E0000  brk         #0xF000
  00000001800C3E58: D0000A08  adrp        x8,0000000180205000
  00000001800C3E5C: 913A8108  add         x8,x8,#0xEA0
  00000001800C3E60: D65F03C0  ret

That way the _isatty code itself will be free to be optimized/modified the way we want, but your patch will always find its 3 instructions.

Based on how I read your code, this will work, but please sanity check.

pbo-linaro commented 7 months ago

Only adrp/add pair is used for detection. If it gets broken, no worry, I'll implement a new monkey patch to support whatever code layout is generated.

To ensure it's the latest before ret, maybe you can simply introduce a compiler barrier there, so no other variable will be loaded instead of __pioinfo.

Out of curiosity, do you know why __pioinfo public visibility was removed? Today, it seems like MS has to ensure compatibility based on code layout (as you implemented), when it could be much easier by simply exporting the symbol, like it was before.

ArminG-MSFT commented 7 months ago

Hello Pierrick, my goal with the patch is to not be dependent on good people like you to have to fix it.

Also, that ties into the other question. The reality is __pioinfo is not a public structure and we do not want people using it, exactly as it means we now have to keep it stable. But now Ruby uses it, it will break not just Ruby but the countless of applications that use Ruby under the hood. You may be able to provide a quick patch, but the countless of Ruby based applications out there will not be patched quickly or at all, wreaking havoc on end-users. Windows cannot just break stuff and rely on well meaning developers like you to patch it, as we have a large binary compatibility expectation that is often expected for years to persist. It is the blessing and curse of Windows backwards compatibility at the same time. :-)

So __pioinfo is now in a sort of weird state where it is not public, but we have to keep it stable'ish. Hence, I want to make sure this sequence stays stable, while at the same time we can still freely optimize _isatty the way we want. We may for instance add security protections or so that changes assembly. In that sense this thread is great, as we did not realize this dependency before.

Last, note that the dummy assembly I'm adding is never executed. So there is no need for a barrier. Ruby is only passively scanning it.

pbo-linaro commented 7 months ago

Thank you for the great insight on this, it's interesting to read this from an internal developer.

Last, note that the dummy assembly I'm adding is never executed. So there is no need for a barrier. Ruby is only passively scanning it.

My point on using a _ReadWriteBarrier (or as documentation suggests, atomic_thread_fence) is that it will prevent compiler optimizer to reorder instructions before this. On x64, it won't emit any memory barrier (as the architecture is TSO).

So, in this kind of code (that should be similar to a portion of _isatty):

int* pint = &global_int;
atomic_thread_fence(std::memory_order_acq_rel);
ioinfo **pio = &__pioinfo;
*pint = 42; 
pio->field = SOME_VALUE;
return;

Introducing the barrier guarantees that pioinfo address is always loaded after global_int, with any optimization/compiler. So in this case, we'll keep the pattern as expected (latest adrp/add before ret). So, even if the code is never executed, we preserve the correct layout.

ArminG-MSFT commented 7 months ago

I see, that would work indeed. However, I've written the dummy sequence in manual assembly so the compiler/linker will not touch/alter it either. The actual C _isatty is now simply renamed to _isatty_proc and can remain as is, but is now free to be optimized and altered in any way. The manual assembly _isatty implementation just jumps to C _isatty_proc instead. Below this jump is however the unreachable code that will contain the sequence you need. The 5 line example I gave you is the literal code ucrtbase.dll will have for ARM64.

So we'll have best of both worlds.

pbo-linaro commented 7 months ago

Thanks for your collaboration on this :)

nurse commented 7 months ago

FYI, I summarized why CRuby needs __pioinfo and possible alternative.

Background

Ruby's I/O implementation doesn't use C stdio library. This is because stdio's buffering causes some problems for example

Especially about "Non blocking I/O and select(2) support", this is because stdio doesn't provide a way to check whether a buffer in a FILE struct has content or empty. When we want to call fread to read bytes from a file but we don't want to be blocked, we need to check the buffer in the FILE struct is empty or not. If it is empty, fread will be blocked, but if it has some bytes, it will be immediately returned.

Because of these problems, Ruby 1.9 or later avoid to use stdio and implement their own buffers on top of Unix low-level I/O functions. Most of them are also implemented in Visual C++.

Ruby's requirements for I/O

Ruby's I/O has various capability. Especially related to this context following features are hard to implement.

Requirement for Low-level I/O functions

Visual C++ has some Low-level I/O functions. But compared with Unix it doesn't have enough capability to implement above requirements. If following features we can implement the same feature without directly using __pioinfo.

The current implementation detail

__pioinfo[] is only referenced from _pioinfo(fd)

_pioinfo(fd) is only used from below 4 macros.

macumber commented 4 months ago

I'm on a new Windows ARM Dev Kit machine with Windows 11 23H2 22631.3085. I've installed rubyinstaller-3.2.3-1-x64.exe to this machine. I've downloaded and installed Windows Kits 10.0.25357.0 and 10.0.26020.0. I thought that by adding the redistributable path to the front of my path, e.g. $env:Path = 'C:\Program Files (x86)\Windows Kits\10\Redist\10.0.25357.0\ucrt\DLLs\x64\;' + $env:Path that ruby.exe would find and load these new dlls. However, I still get the following error :

PS C:\Ruby32-x64\bin> .\ruby.exe -e "puts 'hi'"
unexpected ucrtbase.dll

Is there some other step that I am missing? I've tried adding both the x64 and arm redistributables to the path, neither seem to work.

pbo-linaro commented 4 months ago

If it's ok for you, you can use ruby 3.0, which does not use UCRT. An alternative is to use native windows-arm64 ruby, available with msys2.

ArminG-MSFT commented 3 months ago

Hello all, Just a heads up, we've backported the two fixes for ARM64 (running x64 Ruby emulated as well as the new ARM64 native Ruby) to Windows 11 22H2. So Ruby will now also work on non-preview builds of ARM64 Windows. The exact KB that contains the fix is KB5035942. That went out to preview last week and general public as of 2pm today.