msys2 / msys2-installer

The one-click installer for MSYS2
BSD 3-Clause "New" or "Revised" License
561 stars 87 forks source link

Docker build hangs indefinitely after installing MSYS2 #59

Closed relapids closed 1 month ago

relapids commented 1 year ago

Related (?) issue: https://github.com/msys2/msys2-installer/issues/58

Thought I had the same issue but it turns out apparently not... Copying my message from there here below.

==========================

Same thing started happening to me yesterday with no changes on my end (been working for months up until now). Was definitely working two days ago since I have a script to force-rebuild my image daily. I see this on multiple machines and OS versions.

Tried both isolation modes (hyperv and process) and neither are currently working for me. One slight difference to the original reporter is that I'm using servercore:ltsc2022.

Still trying to diagnose further but unfortunately it's difficult to get diagnostic information from Windows containers for these sorts of issues. Trying to investigate using the information here: https://learn.microsoft.com/en-us/virtualization/windowscontainers/troubleshooting

EDIT:

Actually, I think I may have misunderstood the original issue in this thread and conflated it with mine. In my case the processes all appear to run successfully and terminate normally, but then the RUN step never finishes and just hangs indefinitely when finalizing the layer. I'm guessing this is probably a distinct issue and I misclassified it as the same as the one here due to the timing.

Strange though that it's happening on multiple machines and seems to coincide with the latest installer release. I tried rolling back to an earlier installer but that didn't help. Likely because whatever is causing the issue gets updated to the same version as in the latest installer anyway, but that's just a guess on my part so far - I need to do additional testing.

lazka commented 1 year ago

thanks

relapids commented 1 year ago

So I've done some more testing and if I blacklist msys2-runtime (via IgnorePkg in pacman.conf) then I'm able to get through the MSYS2 install and install other packages in subsequent layers, though obviously that's not really a viable workaround as it puts me in an unsupported state which I assume will cause other issues (and likely get worse as more time passes).

Is there some sort of flag I can enable to get verbose diagnostic information related to msys2-runtime to try and help narrow down the source of the problem?

lazka commented 1 year ago

Thanks for testing, yeah, looks like the latest msys2-runtime update broke something only in docker, which is why CI didn't catch it :(

relapids commented 1 year ago

Thanks. At least now I know it's not just me. Let me know if there's anything I can do to help investigate. I'm not really familiar with debugging of MYS2 internals but happy to try and learn to lend a hand.

lazka commented 1 year ago

Please try again now

relapids commented 1 year ago

@lazka Wow that was fast. It appears to be working now. Thank you!

Out of curiosity, what did you change?

lazka commented 1 year ago

@lazka Wow that was fast. It appears to be working now. Thank you!

Thanks for testing.

Out of curiosity, what did you change?

I wrote up my findings here: https://cygwin.com/pipermail/cygwin/2022-December/252711.html

relapids commented 1 year ago

@lazka It appears this has regressed again some time today. Any ideas? I see this on multiple machines again (both locally and my CI server) so I don't think it's a local issue. Was able to work around it with the same hack as last time (blocking update of msys2-runtime).

Biswa96 commented 1 year ago

Can you get the version of msys2-runtime package - which one fails and which one does not ?

relapids commented 1 year ago

This is what I see when I block the update: warning: msys2-runtime: ignoring package upgrade (3.4.3-2 => 3.4.3-3)

So presumably 3.4.3-2 is the working one and 3.4.3-3 the broken one.

relapids commented 1 year ago

By the way I'm currently pulling the nightly installer, but I just tried switching to the one tagged 2022-12-16 and the overall result appears to be the same.

Though the previous version is different in this case (which seems reasonable/expected): warning: msys2-runtime: ignoring package upgrade (3.3.6-6 => 3.4.3-3)

Digging back through previous CI logs I see the failure was occurring when upgrading to 'msys2-runtime-3.4.2-2' though, and in the run where it started working it looks like no update occurred at all.

Perhaps the issue being resolved was an incorrect conclusion originally because I was testing with the nightly installer, and since no update occurred after the attempted fix, that resolved the issue simply because the offending code was no longer being executed anymore (since there was no update), but now that there's an update it's back again.

lazka commented 1 year ago

I see, yeah these might be two unrelated issues. The ASLR change made our nightly builds fail which in turn meant installing the latest build and updating resulted in a runtime update.

relapids commented 1 year ago

Did some more testing yesterday, and this also reproduces with the latest pacman update for me. I checked the Github Actions here (for the Docker builds in msys2-installer) and the latest runs are passing but also doesn't appear to be triggering an update of msys2-runtime or pacman. I wonder if that's the reason it seems to work fine there too (since afaict it always builds the installer first and uploads it, so the Docker builders pull an installer that's up-to-date with the latest runtime/pacman).

For what it's worth this is the test case I'm using (without the workarounds to IgnorePkg) which I ripped from your CI documentation to make sure there was nothing I was introducing in my own Dockerfile to cause the issue:

FROM mcr.microsoft.com/windows/servercore:ltsc2022

RUN powershell -Command \
  $ErrorActionPreference = 'Stop'; \
  $ProgressPreference = 'SilentlyContinue'; \
  (New-Object System.Net.WebClient).DownloadFile('https://github.com/msys2/msys2-installer/releases/download/nightly-x86_64/msys2-base-x86_64-latest.sfx.exe', 'msys2.exe'); \
  .\msys2.exe -y -oC:\; \
  Remove-Item msys2.exe; \
  function msys() { C:\msys64\usr\bin\bash.exe @('-lc') + @Args; } \
  msys ' '; \
  msys 'pacman --noconfirm -Syuu'; \
  msys 'pacman --noconfirm -Syuu'; \
  msys 'pacman --noconfirm -Scc';

If I add the following after the first-run it starts working (and I can install/update other packages).

  msys \"echo >> /etc/pacman.conf\"; \
  msys \"echo '[options]' >> /etc/pacman.conf\"; \
  msys \"echo 'IgnorePkg = msys2-runtime' >> /etc/pacman.conf\"; \
  msys \"echo 'IgnorePkg = pacman' >> /etc/pacman.conf\"; \

How I'm building:

docker build -t msys2-installer -f Dockerfile.2022 --no-cache .

Requires either a msys2-runtime or a pacman update to reproduce. Haven't noticed the same problem with other packages. E.g. I can add steps to install GCC or whatever - I tried many different packages - just fine as long as msys2-runtime/pacman are blacklisted, and updates to the pre-installed curl/libcurl also work so it's not an install vs update difference.

Quite confused, but will keep trying to narrow it down.

lazka commented 1 year ago

No update, but I can at least confirm the issue is reproducible in CI (this case installed a version leading to an update): https://github.com/msys2/msys2-installer/actions/runs/3953154465

chadlwilson commented 9 months ago

Did anyone get any further on this one? Spent a few days hitting my head against a stuck Docker build after adding a choco install msys2 and delighted to finally find this ticket so I can regain my sanity :-)

Running without doing the msys2 system update via either the choco package or ridk allows me to get a built container at least.

cinst ruby                        # install ruby
cinst msys2 --params "/NoUpdate"  # install msys2 without system update
Update-SessionEnvironment         # refresh environment vars
ridk install 3

There is still an unexplained 10 minute delay building the layer from docker which is probably an unrelated or semi-related Docker on Windows file-system issue, but at least it doesn't seem indefinitely stuck.

Fri, 01 Dec 2023 07:44:06 GMT Completed provisioning.
Fri, 01 Dec 2023 07:54:07 GMT Removing intermediate container f1ab44003c4e
Fri, 01 Dec 2023 07:54:07 GMT  ---> 626dbeeb7fee

Upgrading msys2-runtime also seems to be the culprit, similarly to the above. Ignoring it allowed it to complete (warning: msys2-runtime: ignoring package upgrade (3.4.9-2 => 3.4.9-3)):

cinst ruby                        # install ruby
cinst msys2 --params "/NoUpdate"  # install msys2 without system update
Update-SessionEnvironment         # refresh environment vars
C:\\tools\\msys64\\\usr\\bin\bash -c "echo '[options]' >> /etc/pacman.conf"
C:\\tools\\msys64\\\usr\\bin\bash -c "echo 'IgnorePkg = msys2-runtime' >> /etc/pacman.conf"
C:\\tools\\msys64\\\usr\\bin\bash -c "echo 'IgnorePkg = pacman' >> /etc/pacman.conf"
ridk install 2 3
> sh -lc true
MSYS2 seems to be properly installed
Check msys2-keyring version:
 -> up-to-date
Remove catgets to avoid conflicts while update  ...
> pacman -Rdd catgets libcatgets --noconfirm
error: target not found: catgets
error: target not found: libcatgets
MSYS2 system update (optional) part 1  ...
> pacman -Syu --needed --noconfirm
:: Synchronizing package databases...
 clangarm64 downloading...
 mingw32 downloading...
 mingw64 downloading...
 ucrt64 downloading...
 clang32 downloading...
 clang64 downloading...
 msys downloading...
:: Starting core system upgrade...
warning: msys2-runtime: ignoring package upgrade (3.4.9-2 => 3.4.9-3)
resolving dependencies...
warning: terminate other MSYS2 programs before proceeding
looking for conflicting packages...

Packages (2) bash-5.2.021-1  mintty-1~3.7.0-1

Total Download Size:    3.19 MiB
Total Installed Size:  14.11 MiB
Net Upgrade Size:       0.06 MiB

:: Proceed with installation? [Y/n] 
:: Retrieving packages...
 bash-5.2.021-1-x86_64 downloading...
 mintty-1~3.7.0-1-x86_64 downloading...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
checking available disk space...
:: Processing package changes...
upgrading bash...
upgrading mintty...
:: To complete this update all MSYS2 processes including this terminal will be closed. Confirm to proceed [Y/n] 
MSYS2 system update (optional) succeeded
Kill all running msys2 binaries to avoid error "size of shared memory region changed"
MSYS2 system update (optional) part 2 ...
> pacman -Syu --needed --noconfirm
:: Synchronizing package databases...
 clangarm64 downloading...
 mingw32 downloading...
 mingw64 downloading...
 ucrt64 downloading...
 clang32 downloading...
 clang64 downloading...
 msys downloading...
:: Starting core system upgrade...
warning: msys2-runtime: ignoring package upgrade (3.4.9-2 => 3.4.9-3)
 there is nothing to do
:: Starting full system upgrade...
resolving dependencies...
looking for conflicting packages...

Packages (18) gawk-5.3.0-1  gettext-0.22.4-1  libasprintf-0.22.4-1  libgcrypt-1.10.3-1  libgettextpo-0.22.4-1  libgnutls-3.8.2-1  libgpgme-1.23.1-1  libintl-0.22.4-1  libksba-1.6.5-1  liblzma-5.4.5-1  libnghttp2-1.58.0-1  libp11-kit-0.25.3-1  libreadline-8.2.007-1  libsqlite-3.44.1-1  libxml2-2.12.1-1  libxslt-1.1.39-1  p11-kit-0.25.3-1  xz-5.4.5-1

Total Download Size:    8.46 MiB
Total Installed Size:  27.71 MiB
Net Upgrade Size:       0.19 MiB

:: Proceed with installation? [Y/n] 
:: Retrieving packages...
 gettext-0.22.4-1-x86_64 downloading...
 libgnutls-3.8.2-1-x86_64 downloading...
 gawk-5.3.0-1-x86_64 downloading...
 libsqlite-3.44.1-1-x86_64 downloading...
 libxml2-2.12.1-1-x86_64 downloading...
 xz-5.4.5-1-x86_64 downloading...
 libgcrypt-1.10.3-1-x86_64 downloading...
 p11-kit-0.25.3-1-x86_64 downloading...
 libgpgme-1.23.1-1-x86_64 downloading...
 libreadline-8.2.007-1-x86_64 downloading...
 libp11-kit-0.25.3-1-x86_64 downloading...
 libxslt-1.1.39-1-x86_64 downloading...
 libgettextpo-0.22.4-1-x86_64 downloading...
 libksba-1.6.5-1-x86_64 downloading...
 liblzma-5.4.5-1-x86_64 downloading...
 libnghttp2-1.58.0-1-x86_64 downloading...
 libintl-0.22.4-1-x86_64 downloading...
 libasprintf-0.22.4-1-x86_64 downloading...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
checking available disk space...
:: Processing package changes...
upgrading libintl...
upgrading libreadline...
upgrading gawk...
upgrading libgettextpo...
upgrading libasprintf...
upgrading gettext...
upgrading libgcrypt...
upgrading libp11-kit...
upgrading libgnutls...
upgrading libksba...
upgrading libsqlite...
upgrading libnghttp2...
upgrading p11-kit...
upgrading liblzma...
upgrading libxml2...
upgrading libxslt...
upgrading libgpgme...
upgrading xz...
:: Running post-transaction hooks...
(1/1) Updating the info directory file...
MSYS2 system update (optional) succeeded
chadlwilson commented 6 months ago

There seems to be some discussion of similar Pacman hangs at https://github.com/git-for-windows/git-for-windows-automation/pull/61 but no idea if it’s related to the issue here, or just sounds similar.

jeremyd2019 commented 3 months ago

I played around with this by copying the docker bits from the CI from this repository, and I saw that the process moved on from the core update, but then experienced a hang after everything was updated. I found that adding rm -r -fo 'C:\$Recycle.Bin\'; to the end of the powershell command in the Dockerfile (after the pacman -Scc) seemed to allow the docker build to progress. (I also added an echo Done; to the end to know that the powershell was completely over and I was probably waiting for Docker)

FROM mcr.microsoft.com/windows/servercore:ltsc2022

COPY ./msys2-x86_64-latest.sfx.exe /msys2.exe

RUN powershell -Command \
  $ErrorActionPreference = 'Stop'; \
  $ProgressPreference = 'SilentlyContinue'; \
  /msys2.exe -y -oC:\; \
  function msys() { C:\msys64\usr\bin\bash.exe @('-lc') + @Args; } \
  msys ' '; \
  msys 'pacman --noconfirm -Syuu'; \
  msys 'pacman --noconfirm -Syuu'; \
  msys 'pacman --noconfirm -Scc'; \
  rm -r -fo 'C:\$Recycle.Bin\'; \
  echo Done;

I'm guessing that docker is choking on the odd unicode characters in the 'binned' files? msys2/MSYS2-packages#4622

Another thing I tried was adding msys 'ps -e' and Get-Process to see if any background processes might be hanging around. There were not.

For reference, I had a find /c/\$Recycle.Bin in there while testing, and it showed the following:

/c/$Recycle.Bin
/c/$Recycle.Bin/S-1-5-93-2-1
/c/$Recycle.Bin/S-1-5-93-2-1/.������������0001000000004ff2dd50a72ab4668b33
/c/$Recycle.Bin/S-1-5-93-2-1/.������������000100000000505cd56ae6179e777be2
/c/$Recycle.Bin/S-1-5-93-2-1/desktop.ini

I would wager those are the old versions of msys-2.0.dll and pacman.exe.

chadlwilson commented 3 months ago

Wow, thanks @jeremyd2019 - I'll try your workaround.

Does one perhaps conclude that docker build on windows tries to empty the recycle bin itself (perhaps for each layer) prior to completing and gets stuck?

jeremyd2019 commented 3 months ago

I was guessing that whatever docker uses to save the filesystem was tripping up on what are probably invalid unicode sequences. But I don't know anything about what docker does.

chadlwilson commented 3 months ago

Yeah that makes more sense given the nature of filesystems and what must be needed for layer exports on windows.

jeremyd2019 commented 3 months ago

If you're interested, the construction of the filenames is explained here: https://github.com/msys2/msys2-runtime/blob/abcb3c6c0f330ac7568956b2be6bf3376517bb56/winsup/cygwin/syscalls.cc#L342-L346

chadlwilson commented 3 months ago

Your rm -r -fo 'C:\$Recycle.Bin\' workaround seemed to work just fine to resolve this problem within a docker build, allowing msys2-runtime and pacman to be updated during the container build. Thanks!

https://github.com/gocd-contrib/gocd-oss-cookbooks/compare/v3.16.4...v3.16.5

chadlwilson commented 3 months ago

Ok, managed to find what seems the root problem here, with similar workarounds discovered: https://github.com/microsoft/Windows-Containers/issues/213

Fix appears to require a containerd (and presumably Docker runtime?) built with Go 1.21+ https://github.com/containerd/containerd/pull/8957#issuecomment-1694549913 to resolve https://github.com/golang/go/issues/59971 .

I am currently using Windows Server 2022 images on GHA which at time of writing have Docker 24.0.7 on them. https://github.com/actions/runner-images/blob/releases/win22/20240514/images/windows/Windows2022-Readme.md This has containerd 1.7.6 in it; but in any case, both Go 1.20 built.

I believe the fix is only in containerd 1.7.14 which landed via https://github.com/containerd/containerd/pull/9860 and https://github.com/containerd/containerd/releases/tag/v1.7.14 and/or Docker 25 as the vendoring vs static containerd inclusion confuses me.

chadlwilson commented 1 month ago

I am currently using Windows Server 2022 images on GHA which at time of writing have Docker 24.0.7 on them. https://github.com/actions/runner-images/blob/releases/win22/20240514/images/windows/Windows2022-Readme.md This has containerd 1.7.6 in it; but in any case, both Go 1.20 built.

FWIW I re-tested this today without the "recycle bin removal workaround" on newer runner images that contain Docker Engine 26.1.3 (containerd 1.7.15 static binaries) and the issue is fixed. This was with runner image 20240721.1.

@jeremyd2019 IMHO I think this can probably be closed now, in that the root cause was containerd-on-Windows problems, for which it has been fixed upstream. If the "junk left in recycle bin" is a problem on its own, I imagine that can/should be addressed separately within msys2 somewhere?

jeremyd2019 commented 1 month ago

That's good. I don't seem to have the ability to close this issue though