quarkus-qe / quarkus-test-suite

16 stars 35 forks source link

HttpAdvancedIT fails with Keyclaok container insufficient memory error in Podman Jenkins jobs #2106

Open jcarranzan opened 2 days ago

jcarranzan commented 2 days ago

It's just failing in Podman Jenkins jobs and HttpAdvancedIT, but we use the same Keycloak service container for other tests in http-module.

Some points about my previous investigation:

michalvavrik commented 2 days ago
jcarranzan commented 1 day ago

Summary of Investigations:

  1. Attempt with 1 MiB Memory Limit: @KeycloakContainer( command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 1 ) Outcome: The Keycloak container failed to start because the minimum required memory is 6 MiB. I then increased the memory limit to 10 MiB: @KeycloakContainer( command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 10 ) but the container crashed with the following error:
    
    2024-10-21 12:54:04,523 ERROR [tc.qua.io/.0] (main) Could not start container: java.lang.IllegalStateException: Wait strategy failed. Container crashed with out-of-memory (OOMKilled)
    2024-10-21 12:54:04,527 ERROR [tc.qua.io/.0] (main) Log output from the failed container:
    /opt/keycloak/bin/kc.sh: line 169:    84 Killed                  'java' -Dkc.config.build-and-exit=true -Djava.util.concurrent.ForkJoinPool.common.threadFactory=io.quarkus.bootstrap.forkjoin.QuarkusForkJoinWorkerThreadFactory '-XX:MetaspaceSize=96M' '-XX:MaxMetaspaceSize=256m' '-Dfile.encoding=UTF-8' '-Dsun.stdout.encoding=UTF-8' '-Dsun.err.encoding=UTF-8' '-Dstdout.encoding=UTF-8' '-Dstderr.encoding=UTF-8' '-XX:+ExitOnOutOfMemoryError' '-Djava.security.egd=file:/dev/urandom' '-XX:+UseG1GC' '-XX:FlightRecorderOptions=stackdepth=512' '-XX:MaxRAMPercentage=70' '-XX:MinRAMPercentage=70' '-XX:InitialRAMPercentage=50' '--add-opens=java.base/java.util=ALL-UNNAMED' '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '--add-opens=java.base/java.security=ALL-UNNAMED' '-Duser.language=en' '-Duser.country=US' -Dkc.home.dir='/opt/keycloak/bin/..' -Djboss.server.config.dir='/opt/keycloak/bin/../conf' -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dpicocli.disable.closures=true -Dquarkus-log-max-startup-records=10000 -cp '/opt/keycloak/bin/../lib/quarkus-run.jar' io.quarkus.bootstrap.runner.QuarkusEntryPoint --profile=dev start-dev '--import-realm' '--hostname-strict=false'
2. Attempt with Increased Memory Limit (2000 MiB):
And I also modified the test to print the memory logs before container is starting

@KeycloakContainer( command = { "start-dev", "--import-realm", "--hostname-strict=false" }, memoryLimitMiB = 2000 )

Executed in our Podman-based Jenkins job.
Log Jenkins job output: 

13:16:02 [INFO] Running io.quarkus.ts.http.advanced.HttpAdvancedIT 13:16:02 === JVM Memory Info === 13:16:02 Free Memory: 453 MB 13:16:02 Total Memory: 1788 MB 13:16:02 Max Memory: 1924 MB 13:16:02 === System Memory Info === 13:16:02 Total Physical Memory: 7695 MB 13:16:02 Free Physical Memory: 1263 MB 13:16:02 === Disk Space Info === 13:16:02 Total Disk Space: 40847 MB 13:16:02 Free Disk Space: 32480 MB 13:16:12 11:16:12,469 Container quay.io/keycloak/keycloak:25.0 is starting: 410b311947561a85de756b388a52cf266b40f549eb1976b4155f44f034d4a015 13:21:12 11:21:12,785 Could not start container: java.lang.IllegalStateException: Wait strategy failed. Container exited with code 1 13:21:12 at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:533) 13:21:12 at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:354) 3:21:12 11:21:12,813 Log output from the failed container: 13:21:12 OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006af400000, 4034920448, 0) failed; error='Not enough space' (errno=12) 13:21:12 # 13:21:12 # There is insufficient memory for the Java Runtime Environment to continue. 13:21:12 # Native memory allocation (mmap) failed to map 4034920448 bytes. Error detail: committing reserved memory. 13:21:12 # An error report file with more information is saved as: 13:21:12 # /tmp/hs_err_pid84.log

rsvoboda commented 1 day ago

What about " memoryLimitMiB = 768?"

What === JVM Memory Info === is printed out when .withProperty("JAVA_OPTS", "-Xms512m -Xmx1g") approach is used ?

jcarranzan commented 1 day ago

Output in Jenkins:

09:36:46 [INFO] Running io.quarkus.ts.http.advanced.HttpAdvancedIT
09:36:47 === JVM Memory Info ===
**09:36:47 Free Memory: 433 MB**
09:36:47 Total Memory: 1537 MB
09:36:47 Max Memory: 1924 MB
09:36:47 === System Memory Info ===
09:36:47 Total Physical Memory: 7695 MB
09:36:47 Free Physical Memory: 1254 MB
09:36:47 === Disk Space Info ===
09:36:47 Total Disk Space: 40847 MB
09:36:47 Free Disk Space: 32480 MB

Output in Jenkins:

0:01:47 === JVM Memory Info ===
**10:01:47 Free Memory: 553 MB**
10:01:47 Total Memory: 1664 MB
10:01:47 Max Memory: 1924 MB
10:01:47 === System Memory Info ===
10:01:47 Total Physical Memory: 7695 MB
10:01:47 Free Physical Memory: 1402 MB
10:01:47 === Disk Space Info ===
10:01:47 Total Disk Space: 40847 MB
10:01:47 Free Disk Space: 32480 MB

and test passed OK

rsvoboda commented 17 hours ago

and test passed OK

only the last one or both cases?

jcarranzan commented 4 hours ago

and test passed OK

only the last one or both cases?

Only the last one, with withProperty("JAVA_OPTS", "-Xms512m -Xmx1g")

rsvoboda commented 1 hour ago

Both are using similar memory, so what's the difference?

rsvoboda commented 1 hour ago

@mabartos do you have any idea why the withProperty("JAVA_OPTS", "-Xms512m -Xmx1g") approach would work better than your https://github.com/quarkus-qe/quarkus-test-framework/pull/1351 change?

michalvavrik commented 1 hour ago

That's all interesting, but as far as actual failure goes, this sentence from the issue description is what matters most:

The issue occurs before tests start. The failure is nor related to the HttpAdvancedIT tests. When Keycloak container tries to allocate memory ,ther's already insufficient memory due to a resource leak.

Here is what you need to do: implement org.junit.platform.launcher.TestExecutionListener (here is example how I did it in TFW ), and print out memory information in executionFinished, executionStarted, testPlanExecutionFinished, testPlanExecutionStarted. Ideally save it so that you can scroll through results and identify exact place where memory jumped or wasn't cleaned.