Closed mraible closed 1 year ago
Hi @mraible, thanks for reaching out. According to https://github.com/actions/runner/issues/1051#issuecomment-1018587013, memory seems more restrictive on public Windows runners for some reason.
I don't think there's much we can do about that as GitHub provides the service. However, we are working on making GraalVM Native Image more memory-efficient.
Anyway, I can think of three options for you:
Please feel free to share your experience and let us know if you found another way to make things work.
@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:
The command line is too long.
4839.7s (94.8% of total time) in 839 GCs | Peak RSS: 6.13GB | CPU load: 1.97
That's a lot of time spent in GC (~81min) and explains why the build ran for so long. Nonetheless, maybe splitting the builds could help to release some memory pressure.
@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:
The command line is too long.
Apologies for mixing that up. Let me look into this next week.
So, I looked into the two build failures:
Also, GitLab recommends adjusting vm.swappiness
(via sudo sysctl vm.swappiness=10
) in memory-constrained environments. Maybe that helps as well?
Adjusting vm.swappiness
does not seem to help in your case: https://github.com/fniephaus/auth0-full-stack-java-example/runs/5275349582?check_suite_focus=true#step:5:10.
On macOS, peak RSS is around 9GB, which is not too far away from 7GB but apparently problematic on Linux and Windows. We also have to keep in mind that the Native Image builder is invoked from Maven, which also requires some memory.
I'm afraid I'm running out of ideas at the moment but will keep my eyes open.
Regarding the Windows issue: it seems you're using an older version of Spring Boot. Could you try bumping that?
The GitHub action is pointing to the spring-native
branch:
https://github.com/oktadev/auth0-full-stack-java-example/blob/main/.github/workflows/publish.yml#L10
That branch is using Spring Boot 2.6.3.
I was able to fix the OOM error on Linux by specifying -J-Xmx7g
in the build arguments:
<plugin>
<groupId>org.graalvm.buildtools</groupId>
<artifactId>native-maven-plugin</artifactId>
<version>${native-buildtools.version}</version>
<extensions>true</extensions>
<executions>
<execution>
<id>test-native</id>
<phase>test</phase>
<goals>
<goal>test</goal>
</goals>
</execution>
<execution>
<id>build-native</id>
<phase>package</phase>
<goals>
<goal>build</goal>
</goals>
</execution>
</executions>
<configuration>
<imageName>native-executable</imageName>
<buildArgs>
<buildArg>--no-fallback --verbose -J-Xmx7g</buildArg>
</buildArgs>
</configuration>
</plugin>
I also found that changing it from 7g
to 10g
drops the build time by about 5 minutes.
I posted my problem with "command is too long" on Windows to Stack Overflow.
I was able to fix the OOM error on Linux by specifying -J-Xmx7g in the build arguments
Great, thanks for the update! Now that I think about it, it makes sense: the JVM allocates only a percentage of available RAM by default, so it probably never actually used the additional swap space that we set up.
That's still a lot of time spent in GCs. Maybe some additional tuning will help but glad things work for now.
I posted my problem with "command is too long" on Windows to Stack Overflow.
Could you bump the native build tools to 0.9.10 and try again (see https://github.com/graalvm/native-build-tools/issues/214)?
Could you bump the native build tools to 0.9.10 and try again (see https://github.com/graalvm/native-build-tools/issues/214)?
I tried this here. The Windows build hasn't failed yet, but it has taken over an hour (so far).
Update: it almost worked.
[2/7] Performing analysis... [*********] (2562.6s @ 5.97GB)
Warning: Could not register complete reflection metadata for org.springframework.boot.actuate.health.ReactiveHealthEndpointWebExtension. Reason(s): java.lang.NoClassDefFoundError: reactor/core/publisher/Mono
33,826 (93.95%) of 36,005 classes reachable
56,486 (79.62%) of 70,946 fields reachable
170,797 (65.78%) of 259,658 methods reachable
2,318 classes, 803 fields, and 11,795 methods registered for reflection
82 classes, 78 fields, and 67 methods registered for JNI access
[3/7] Building universe... (299.9s @ 6.08GB)
Error: Image build request failed with exit status 1
Update: it almost worked.
Good! I assume this ran with -J-Xmx7g
? It's weird that we don't see an error, so maybe this is another OOM crash? Maybe try again with -J-Xmx8g
?
I assume this ran with -J-Xmx7g
I'm currently using -J-Xmx10g
.
I was able to fix the windows build by setting the minimum pagefile size to 10GB! 🎉
You can see the successful run for details.
Time spent building native images:
Interestingly, the successful Windows run used -Xmx6012577376
and not -Xmx10g
, which you seem to have dropped from your PR. So now I wonder how stable those builds are going to run.
I'm experimenting with using the SerialGC on GitHub actions, which seems to work a bit better than the ParallelGC. However, doing that is currently a bit awkward: -J-XX:-UseParallelGC -J-XX:+UseSerialGC
(need to disable UseParallelGC
first).
@fniephaus I accidentally removed the setting. I restored it in https://github.com/oktadev/auth0-full-stack-java-example/pull/5/commits/b779335ed16b868ac7e301d40934730c76b400eb.
The windows build worked a couple of days ago. Now it's failing with:
[INFO] npm ERR! code 1
[INFO] npm ERR! path D:\a\auth0-full-stack-java-example\auth0-full-stack-java-example\node_modules\puppeteer
[INFO] npm ERR! command failed
[INFO] npm ERR! command C:\Windows\system32\cmd.exe /d /s /c node install.js
[INFO] npm ERR! ERROR: Failed to set up Chromium r869685! Set "PUPPETEER_SKIP_DOWNLOAD" env variable to skip download.
[INFO] npm ERR! [Error: ENOSPC: no space left on device, write] {
[INFO] npm ERR! errno: -4055,
[INFO] npm ERR! code: 'ENOSPC',
[INFO] npm ERR! syscall: 'write'
[INFO] npm ERR! }
This issue seems to indicate it's using more than 14GB of disk space.
I'll try setting PUPPETEER_SKIP_DOWNLOAD
, but I'm not sure this will help.
@fniephaus Changing the build to use windows-2019
instead of windows-latest
solves the problem.
For anyone reading this, I highly recommend upgrading to GraalVM 22.2+. We have made Native Image significantly more robust in memory-constrained environments, which means you should now be able to build large Java applications with Native Image on GitHub Actions without any problems.
@fniephaus Hi, do I need to open a new issue? I found in https://github.com/oracle/graalvm-reachability-metadata/pull/122#issuecomment-1338515655 that the memory occupied by setup-graalvm made the Github Action device crash.
No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?
No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?
Because this problem is similar to the problem I encountered locally before (GraalVM CE 22.3.0, JDK 11 and JDK 17). I compile projects related to GraalVM Native Image in WSL under Windows, and I only give 8GB of memory to WSL by default. Once I handle multiple tasks at the same time (such as running multiple GUI applications through WSL), and execute the nativeTest task of GraalVM Native Build Tools through gradle, once the memory usage exceeds 8GB, the entire WSL instance will become unresponsive, and I must execute wsl --shutdown
in powershell to restart WSL, in order to re-use WSL.
Even so, I'm not sure how to collect the log of Github Action.
The build job you mention in https://github.com/oracle/graalvm-reachability-metadata/pull/122#issuecomment-1338515655 does not show any signs of memory issues, so I'm going to close this again. Maybe there's something wrong with the metadata you're contributing.
I ask because I'm getting the following error:
You can see my actions definition here.