kroxylicious / kroxylicious-junit5-extension

JUnit5 extension and helpers for writing tests parameterised over Kafka clusters
Apache License 2.0
9 stars 10 forks source link

JVM intermittently crashes partway through tests #162

Closed gracegrimwood closed 1 year ago

gracegrimwood commented 1 year ago

Describe the bug When running the kroxylicious-junit-extension tests, or when using the extension in other tests (specifically the Kroxylicious integration tests, see https://github.com/kroxylicious/kroxylicious/issues/508) the JVM intermittently crashes with something resembling the following error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.1.2:test (default-test) on project testing-impl: 
[ERROR] 
[ERROR] Please refer to /Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl' && '/Users/ggrimwoo/.sdkman/candidates/java/20-tem/bin/java' '-jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire/surefirebooter-20230803131446310_3.jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire' '2023-08-03T13-14-43_682-jvmRun1' 'surefire-20230803131446310_1tmp' 'surefire_0-20230803131446310_2tmp'
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] Crashed tests:
[ERROR] io.kroxylicious.testing.kafka.KafkaClusterTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl' && '/Users/ggrimwoo/.sdkman/candidates/java/20-tem/bin/java' '-jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire/surefirebooter-20230803131446310_3.jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire' '2023-08-03T13-14-43_682-jvmRun1' 'surefire-20230803131446310_1tmp' 'surefire_0-20230803131446310_2tmp'
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] Crashed tests:
[ERROR] io.kroxylicious.testing.kafka.KafkaClusterTest
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:643)
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:285)
[ERROR]     at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1203)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1055)
[ERROR]     at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:871)
[ERROR]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:126)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:342)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:330)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:175)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000(MojoExecutor.java:76)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run(MojoExecutor.java:163)
[ERROR]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute(DefaultMojosExecutionStrategy.java:39)
[ERROR]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:160)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:105)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:73)
[ERROR]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:53)
[ERROR]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:118)
[ERROR]     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:261)
[ERROR]     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:173)
[ERROR]     at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:101)
[ERROR]     at org.apache.maven.cli.MavenCli.execute(MavenCli.java:910)
[ERROR]     at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:283)
[ERROR]     at org.apache.maven.cli.MavenCli.main(MavenCli.java:206)
[ERROR]     at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
[ERROR]     at java.base/java.lang.reflect.Method.invoke(Method.java:578)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:283)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:226)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:407)
[ERROR]     at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:348)

To Reproduce Steps to reproduce the behavior:

  1. From either within the kroxylicious-junit-extension project root or from within a project that uses kroxylicious-junit-extension for testing, run mvn clean install (or whichever maven command triggers tests for the project)
  2. Continue to run that same command over and over, and eventually you will see the JVM crash mid-test with an error like the one above.

Expected behavior The JVM shouldn't crash.

Logs

Logs from running the kroxylicious-junit-extension tests are here, and I'm adding more examples to this gist as I encounter them. More logs produced from earlier testing against the Kroxylicious integration tests can be found linked from the original Kroxylicious issue.

Additional context I first spotted this issue while refactoring the Kroxylicious integration tests, where we subsequently found that intercepting calls to System.exit() (with a custom Security Manager) did not prevent or otherwise change the behaviour of the crashes, and that it was occurring during either test setup or teardown (not during an actual test method). The latter caused us to investigate this extension, where I was immediately able to reproduce the issue by running mvn clean install in the kroxylicious-junit-extension project root.

Notably, I can never find any *.dump or *.dumpstream files following these crashes (here or in Kroxylicious), regardless of what the Maven output says about them. This could possibly indicate that this isn't a "true" JVM crash and is instead a call being made somewhere to System.exit() along with Surefire's documented intolerance of such calls, but as mentioned above our attempts to intercept these calls when running the Kroxylicious integration tests did not produce any results.

The internet is very sparse on information around why Maven and/or Surefire might produce this "JVM crash" error but not produce any dump files, but I think we can rule out OOMKiller (I've got 32GB RAM on the machine I'm testing on, it really shouldn't be using that much). That leaves us with either something making a call to System.exit() (at this point we've tried various methods to intercept or otherwise highlight this in the logs, to no success), or an actual crash occurring in a place that Surefire can't/won't produce dumps from (this StackOverflow answer implies that in JUnit5 and newer Surefire versions, fatal errors which occur in a @BeforeAll method will not crash and therefore may not produce dumps), or potentially that something is cleaning up those dumps when the build ends. Or there's some other mysterious way in which Surefire can complain about a JVM crash and yet not actually produce any useful dumps about it.

Additionally, it seems Maven is cleaning up the files it mentions in lines like this before I have a chance to examine them:

[ERROR] Command was /bin/sh -c cd '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl' && '/Users/ggrimwoo/.sdkman/candidates/java/20-tem/bin/java' '-jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire/surefirebooter-20230803131446310_3.jar' '/Users/ggrimwoo/workspace/kroxylicious-junit5-extension/impl/target/surefire' '2023-08-03T13-14-43_682-jvmRun1' 'surefire-20230803131446310_1tmp' 'surefire_0-20230803131446310_2tmp'

I'm assuming this is probably done as part of Maven's post-build cleanup behaviour. Many of these are temporary build files (i.e. surefire-20230803131446310_1tmp) so I imagine Maven will be deleting these by default when the build ends (whether successful or not). This is also why I am considering the possibility that there actually are dumps being produced, and they're just getting removed ("cleaned up") as soon as they're written.

SamBarker commented 1 year ago

Fixed by #161