jline / jline3

JLine is a Java library for handling console input.
Other
1.49k stars 218 forks source link

Investigate why windows tests last 15 minutes #922

Closed gnodet closed 5 months ago

puskarpeter commented 7 months ago

I investigated this a bit and the issue seems to be the shell (gitbash.exe). Since this is an emulated shell, the IO performance is what causes this slowdown. From the profiler, it seems the readBytes calls take an eternity.

The terminal does not seem to play the role since the same behavior was observed when running Gitbash from mintty (default) and also launching gitbash from Powershell and then running the build (that uses Windows Console Host I believe)

There are 3 shells installed in the image: https://github.com/actions/runner-images/blob/main/images/windows/Windows2022-Readme.md#shells

I tried all of them. WSL one wrapper script does not even run. MSys builds fine in the expected time. Gitbash takes 20 minutes.

Inside Powershell and Powershell 7 it also builds fine. Same with IDEA on Windows. Also same inside cmd.exe with the mvnw.bat.

Hence I assume what's going on is that even in the run logs it says that

  ./mvnw --no-transfer-progress verify
  shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
  env:
    JAVA_HOME: C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\[2](https://github.com/jline/jline3/actions/runs/8724414918/job/23934945006#step:4:2)2.0.0-36\x64
    JAVA_HOME_22_X64: C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\22.0.0-[3](https://github.com/jline/jline3/actions/runs/8724414918/job/23934945006#step:4:3)6\x64

in the background, it gets delegated to gitbash.exe since the shebang /bin/sh matches that one.

One more thing I noticed is that the wrapper plugin is updated but the script was not regenerated. I tried wrapper script versions 3.2.0 and the new 3.3.0 and the wrapper invocation fails inside Gitbash with the error that the URL does not match.

gnodet commented 7 months ago

I investigated this a bit and the issue seems to be the shell (gitbash.exe). Since this is an emulated shell, the IO performance is what causes this slowdown. From the profiler, it seems the readBytes calls take an eternity.

The terminal does seem to play the role since the same behavior was observed when running Gitbash from mintty (default) and also launching gitbash from Powershell and then running the build (that uses Windows Console Host I believe)

There are 3 shells installed in the image: https://github.com/actions/runner-images/blob/main/images/windows/Windows2022-Readme.md#shells

I tried all of them. WSL one wrapper script does not even run. MSys builds fine in the expected time. Gitbash takes 20 minutes.

Inside Powershell and Powershell 7 it also builds fine. Same with IDEA on Windows. Also same inside cmd.exe with the mvnw.bat.

Hence I assume what's going on is that even in the run logs it says that

  ./mvnw --no-transfer-progress verify
  shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
  env:
    JAVA_HOME: C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\[2](https://github.com/jline/jline3/actions/runs/8724414918/job/23934945006#step:4:2)2.0.0-36\x64
    JAVA_HOME_22_X64: C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\22.0.0-[3](https://github.com/jline/jline3/actions/runs/8724414918/job/23934945006#step:4:3)6\x64

in the background, it gets delegated to gitbash.exe since the shebang /bin/sh matches that one.

One more thing I noticed is that the wrapper plugin is updated but the script was not regenerated. I tried wrapper script versions 3.2.0 and the new 3.3.0 and the wrapper invocation fails inside Gitbash with the error that the URL does not match.

Awesome, thx for the investigation. The wrapper stuff will be updated with https://github.com/jline/jline3/pull/972

So you're saying the problem is caused by the GH job using the Java_Temurin-Hotspot_jdk JDK basically ? I suppose we could try using a different one, or maybe reporting the issue. The slowness could be caused by a bug in the JDK. For example, jline relies on some input stream calls to return with 0 read chars and not wait for long time until some data comes in, but no idea if it's related or not.

puskarpeter commented 7 months ago

I was using Temurin 22.0.1, I can try some other JDKs too to rule this one out. Will give you an update.

puskarpeter commented 7 months ago

I tried Azul Zulu, Amazon Corretto, Oracle, OpenJDK, they all have the same problem inside Gitbash. In Powershell 7 they all work fine. If it is a bug in JDK, then it is present upstream in all the versions.

puskarpeter commented 7 months ago

It does not matter whether mvnw or mvnw.cmd is used inside Gitbash and Powershell 7. Both work fine inside Powershell 7 and both build slowly inside Gitbash. So it seems that the problem is the underlying shell that invokes the commands on the windows-latest runner.

gnodet commented 7 months ago

@puskarpeter Do you think you could come up with a PR to a shell different than Gitbash ?

puskarpeter commented 7 months ago

Let's see if this is enough: https://github.com/jline/jline3/pull/976

puskarpeter commented 7 months ago

image You were right about those calls, both of those long-running methods are waiting: image image Seems like the native call to readBytes is slow since that is also done by Surefire: image There is waiting, then something happens around 15 second mark then again waiting for 2 minutes: image Not sure if attaching debugger corrupts the streams somehow, but I guess it boils down to waiting for those native readBytes calls to finish. Hopefully this helps somehow.

gnodet commented 5 months ago

Fixed by #976