graalvm / setup-graalvm

GitHub Action for setting up GraalVM distributions.
https://www.graalvm.org
Universal Permissive License v1.0
195 stars 28 forks source link

Is it possible to increase available memory? #6

Closed mraible closed 1 year ago

mraible commented 2 years ago

I ask because I'm getting the following error:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

You can see my actions definition here.

fniephaus commented 2 years ago

Hi @mraible, thanks for reaching out. According to https://github.com/actions/runner/issues/1051#issuecomment-1018587013, memory seems more restrictive on public Windows runners for some reason.

I don't think there's much we can do about that as GitHub provides the service. However, we are working on making GraalVM Native Image more memory-efficient.

Anyway, I can think of three options for you:

  1. Try Linux or macOS instead of Windows.
  2. Reduce the amount of work happening in one build. The build you linked ran for 1.5 hours. You could split the build into different stages, e.g., one that builds the jars of your app and one that builds a native executable with GraalVM Native Image.
  3. Set up your own Windows-based GitHub runner on a machine with additional memory resources.

Please feel free to share your experience and let us know if you found another way to make things work.

mraible commented 2 years ago

@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:

The command line is too long.
fniephaus commented 2 years ago

4839.7s (94.8% of total time) in 839 GCs | Peak RSS: 6.13GB | CPU load: 1.97

That's a lot of time spent in GC (~81min) and explains why the build ran for so long. Nonetheless, maybe splitting the builds could help to release some memory pressure.

fniephaus commented 2 years ago

@fniephaus The job I linked to is on Linux. It works fine on Mac and the job completes in 28m. Linux gets this error after 1h and 29m. With Windows, I get a different error:

The command line is too long.

Apologies for mixing that up. Let me look into this next week.

fniephaus commented 2 years ago

So, I looked into the two build failures:

  1. Regarding the Windows issue: it seems you're using an older version of Spring Boot. Could you try bumping that? https://github.com/graalvm/native-build-tools/pull/126 seems to fix the same problem in our native build tools so if bumping doesn't work, I think it'd makes sense to file an issue against Spring Boot.
  2. Regarding the OOM issue: it seems that for Linux and Windows, free GitHub runner only provide 7GB of RAM while macOS runners provide twice as much. That explains, why there's a lot more memory pressure in your Linux build. I see you already tried increasing the swap size. Did that help (some Linux builds were canceled because of the Windows problem). How much RAM (Peak RSS is printed as part of the Native Image build output) does the build process consume when you build your project locally?
fniephaus commented 2 years ago

Also, GitLab recommends adjusting vm.swappiness (via sudo sysctl vm.swappiness=10) in memory-constrained environments. Maybe that helps as well?

fniephaus commented 2 years ago

Adjusting vm.swappiness does not seem to help in your case: https://github.com/fniephaus/auth0-full-stack-java-example/runs/5275349582?check_suite_focus=true#step:5:10.

On macOS, peak RSS is around 9GB, which is not too far away from 7GB but apparently problematic on Linux and Windows. We also have to keep in mind that the Native Image builder is invoked from Maven, which also requires some memory.

I'm afraid I'm running out of ideas at the moment but will keep my eyes open.

mraible commented 2 years ago

Regarding the Windows issue: it seems you're using an older version of Spring Boot. Could you try bumping that?

The GitHub action is pointing to the spring-native branch:

https://github.com/oktadev/auth0-full-stack-java-example/blob/main/.github/workflows/publish.yml#L10

That branch is using Spring Boot 2.6.3.

mraible commented 2 years ago

I was able to fix the OOM error on Linux by specifying -J-Xmx7g in the build arguments:

<plugin>
    <groupId>org.graalvm.buildtools</groupId>
    <artifactId>native-maven-plugin</artifactId>
    <version>${native-buildtools.version}</version>
    <extensions>true</extensions>
    <executions>
        <execution>
            <id>test-native</id>
            <phase>test</phase>
            <goals>
                <goal>test</goal>
            </goals>
        </execution>
        <execution>
            <id>build-native</id>
            <phase>package</phase>
            <goals>
                <goal>build</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <imageName>native-executable</imageName>
        <buildArgs>
            <buildArg>--no-fallback --verbose -J-Xmx7g</buildArg>
        </buildArgs>
    </configuration>
</plugin>

I also found that changing it from 7g to 10g drops the build time by about 5 minutes.

I posted my problem with "command is too long" on Windows to Stack Overflow.

fniephaus commented 2 years ago

I was able to fix the OOM error on Linux by specifying -J-Xmx7g in the build arguments

Great, thanks for the update! Now that I think about it, it makes sense: the JVM allocates only a percentage of available RAM by default, so it probably never actually used the additional swap space that we set up.

902.9s (41.4% of total time) in 150 GCs

That's still a lot of time spent in GCs. Maybe some additional tuning will help but glad things work for now.

I posted my problem with "command is too long" on Windows to Stack Overflow.

Could you bump the native build tools to 0.9.10 and try again (see https://github.com/graalvm/native-build-tools/issues/214)?

mraible commented 2 years ago

Could you bump the native build tools to 0.9.10 and try again (see https://github.com/graalvm/native-build-tools/issues/214)?

I tried this here. The Windows build hasn't failed yet, but it has taken over an hour (so far).

Update: it almost worked.

[2/7] Performing analysis...  [*********]                                                             (2562.6s @ 5.97GB)
Warning: Could not register complete reflection metadata for org.springframework.boot.actuate.health.ReactiveHealthEndpointWebExtension. Reason(s): java.lang.NoClassDefFoundError: reactor/core/publisher/Mono
  33,826 (93.95%) of 36,005 classes reachable
  56,486 (79.62%) of 70,946 fields reachable
 170,797 (65.78%) of 259,658 methods reachable
   2,318 classes,   803 fields, and 11,795 methods registered for reflection
      82 classes,    78 fields, and    67 methods registered for JNI access
[3/7] Building universe...                                                                             (299.9s @ 6.08GB)
Error: Image build request failed with exit status 1
fniephaus commented 2 years ago

Update: it almost worked.

Good! I assume this ran with -J-Xmx7g? It's weird that we don't see an error, so maybe this is another OOM crash? Maybe try again with -J-Xmx8g?

mraible commented 2 years ago

I assume this ran with -J-Xmx7g

I'm currently using -J-Xmx10g.

I was able to fix the windows build by setting the minimum pagefile size to 10GB! 🎉

You can see the successful run for details.

Time spent building native images:

fniephaus commented 2 years ago

Interestingly, the successful Windows run used -Xmx6012577376 and not -Xmx10g, which you seem to have dropped from your PR. So now I wonder how stable those builds are going to run.

I'm experimenting with using the SerialGC on GitHub actions, which seems to work a bit better than the ParallelGC. However, doing that is currently a bit awkward: -J-XX:-UseParallelGC -J-XX:+UseSerialGC (need to disable UseParallelGC first).

mraible commented 2 years ago

@fniephaus I accidentally removed the setting. I restored it in https://github.com/oktadev/auth0-full-stack-java-example/pull/5/commits/b779335ed16b868ac7e301d40934730c76b400eb.

The windows build worked a couple of days ago. Now it's failing with:

[INFO] npm ERR! code 1
[INFO] npm ERR! path D:\a\auth0-full-stack-java-example\auth0-full-stack-java-example\node_modules\puppeteer
[INFO] npm ERR! command failed
[INFO] npm ERR! command C:\Windows\system32\cmd.exe /d /s /c node install.js
[INFO] npm ERR! ERROR: Failed to set up Chromium r869685! Set "PUPPETEER_SKIP_DOWNLOAD" env variable to skip download.
[INFO] npm ERR! [Error: ENOSPC: no space left on device, write] {
[INFO] npm ERR!   errno: -4055,
[INFO] npm ERR!   code: 'ENOSPC',
[INFO] npm ERR!   syscall: 'write'
[INFO] npm ERR! }

This issue seems to indicate it's using more than 14GB of disk space.

I'll try setting PUPPETEER_SKIP_DOWNLOAD, but I'm not sure this will help.

mraible commented 2 years ago

@fniephaus Changing the build to use windows-2019 instead of windows-latest solves the problem.

fniephaus commented 2 years ago

For anyone reading this, I highly recommend upgrading to GraalVM 22.2+. We have made Native Image significantly more robust in memory-constrained environments, which means you should now be able to build large Java applications with Native Image on GitHub Actions without any problems.

linghengqian commented 1 year ago

@fniephaus Hi, do I need to open a new issue? I found in https://github.com/oracle/graalvm-reachability-metadata/pull/122#issuecomment-1338515655 that the memory occupied by setup-graalvm made the Github Action device crash.

fniephaus commented 1 year ago

No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?

linghengqian commented 1 year ago

No need @linghengqian, I've reopen this issue. How do you know that the build in question failed due to not enough memory?

fniephaus commented 1 year ago

The build job you mention in https://github.com/oracle/graalvm-reachability-metadata/pull/122#issuecomment-1338515655 does not show any signs of memory issues, so I'm going to close this again. Maybe there's something wrong with the metadata you're contributing.

fniephaus commented 8 months ago

GitHub-hosted runners: Double the power for open source :tada: