melix / jmh-gradle-plugin

Integrates the JMH benchmarking framework with Gradle
Apache License 2.0
666 stars 88 forks source link

fix: run GC after Runner finished #145

Closed grv87 closed 5 years ago

grv87 commented 5 years ago

This patch, used together with forceGC=true, fixes #134.

melix commented 5 years ago

Thanks, could you verify that it actually solves #134 ?

jnellis commented 5 years ago

jmh.jar is still locking on windows. This seems like this issue https://github.com/gradle/gradle/issues/937 which points to this issue https://github.com/gradle/gradle/issues/5998 which recommends this tool https://file-leak-detector.kohsuke.org/ to figure out the root cause.

grv87 commented 5 years ago

I tried File Leak Detector for both Gradle and JMH, but it didn't give me any results except non-closed socket in JMH. Which I ascribed to ShutdownTimeoutThread. Maybe this tool itself has side effect, and files become closed correctly.

@jnellis, waiting for 30 seconds, as I noted here, doesn't help too? If not, could you share your build script? All Gradle issues that I've seen, including two mentioned by you, point out that the leak is in the one of plugins. Maybe not even in JMH plugin. It makes sense to analyze the whole build.

jnellis commented 5 years ago

Thanks for running that leak detector, the jar file was giving me manifest error. Trying to change jmh.shutdownTimeout doesn't seem to have an affect. Running jmh from a stopped gradle daemon it does of course work. When the task runs, the gradle daemon starts five more java processes one of which is another gradle daemon and the rest are 'worker daemon's; upon completion of the task, two of these worker daemon processes are remaining. gradle --status shows one IDLE process whose PID matches one of these java processes and then it will list 1-4 more STOPPED processes.

This is the command line for the gradle daemon idle process:

C:\libs\javapath\bin\java.exe -XX:+HeapDumpOnOutOfMemoryError -Xmx1024m -Dfile.encoding=windows-1252 -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\wrapper\dists\gradle-4.9-bin\e9cinqnqvph59rr7g70qubb4t\gradle-4.9\lib\gradle-launcher-4.9.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 4.9

These are the command lines for the two worker daemon processes that remain:

C:\libs\javapath\bin\java.exe -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager -Dfile.encoding=windows-1252 -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\caches\4.9\workerMain\gradle-worker.jar worker.org.gradle.process.internal.worker.GradleWorkerMain "'Gradle Worker Daemon 2'"

C:\libs\javapath\bin\java.exe -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager -Dfile.encoding=windows-1252 -Djava.io.tmpdir=C:\Users\jnellis\dev\github\binpack\build\tmp\jmh -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\caches\4.9\workerMain\gradle-worker.jar worker.org.gradle.process.internal.worker.GradleWorkerMain "'Gradle Worker Daemon 3'"

You can run the jmh task again. It's only when you try to cleanthat you'll get 'can not delete jmh.jar' message because its trying to delete that directory. So if you don't have a habit of not cleaning before tasks then you won't see this error. Up until this point, I've just killed the gradle daemon entirely to get moving again.

Now, what if I kill these two worker processes but not the idle gradle daemon process and try to run jmh again (but don't clean first).
C:\Users\jnellis\dev\github\binpack>gradlew --stacktrace jmh > stdout.txt 2> stderr.txt stdout.txt stderr.txt

At this point I decided to do a clean since I had killed those two processes and knew it would be able to delete that jmh.jar. NOTE: the original gradle daemon process is still sitting idle, I have not done gradlew --stop. Then I tried to run jmh again and it fails but in a different way.

C:\Users\jnellis\dev\github\binpack>gradlew clean

BUILD SUCCESSFUL in 1s
1 actionable task: 1 executed
C:\Users\jnellis\dev\github\binpack>gradlew --stacktrace jmh > stdout2.txt 2> stderr2.txt

C:\Users\jnellis\dev\github\binpack>

stdout2.txt stderr2.txt

From this you can conclude that these two Gradle Worker Daemon processes (2 & 3) respectively map to the plugin tasks jmhRunByteCodeGenerator (me.champeau.gradle.JmhBytecodeGeneratorRunnable) and jmh (me.champeau.gradle.IsolatedRunner).

So looking for commonalities of these two files I notice that both use javax.inject.@Inject along with JHMTask.java. Then I notice in JmhBytecodeGeneratorTask.java it uses import org.gradle.api.tasks.@CacheableTask. I don't know anything really about gradle plugin lifecycles and so haven't cloned the plugin to attempt building it but if you look at the javax.inject.@Inject javadocs it warns at the bottom about circular dependencies between fields. This seems a reasonable behavior between two related processes that can't shut down.

Let's revert this perhaps https://github.com/melix/jmh-gradle-plugin/commit/bb22f5e9c9ca3dd8a204c6cef2141b5238434e37#diff-0d315c75a161a3965b2c133585e20926

and maybe revert https://github.com/melix/jmh-gradle-plugin/commit/b9008835e290a53a5f429f48b176ee397972d8e0#diff-71f3f211dfb4fe11a09433c467596c97 as it seems to make benchmarks with many params run slower.

As for my build file, it's basically straight out of the plugin example.

grv87 commented 5 years ago

@jnellis, thanks for inspecting processes. I confirm that I can still reproduce the issue in https://github.com/melix/jmh-gradle-example with Gradle 4.8 and 5.1.1. The worker of JMHTask locks the file.

So, the problem is that this task is using Worker API, and Worker API does not support volatile classpath. @melix, I see several options:

  1. Open an issue in Gradle, and see what they'll answer. Personally, I see that current Gradle implementation (keeping worker daemon warm between runs) is perfect for workers with fixed classpath. And so I'm not sure they would consider this issue as a correct use case for Worker API.
  2. Use dynamic name / temporary directory for JMH jar.
  3. Remove Worker API. I'm not sure that this is a good solution. Even if we don't use its other features, Worker API still allows running different tasks in parallel. On the other hand, it could be bad idea to run anything in parallel with benchmarking.

@jnellis, I didn't get your reasoning that the problem is in annotations. @Inject is required to use Worker API. There are no circular dependencies, otherwise we would get runtime exception that object can't be created. And removing @CacheableTask doesn't help.

jnellis commented 5 years ago

@grv87 Looks like that's it. It says Worker API was introduced in 4.1, that's where the locking behavior starts. For Windows you have to use gradle 4.0.2 and jmh plugin 0.4.3 right now for no issues. The WSL seems to work just fine though, you can see those same two Worker processes idle still, it just doesn't lock the jar file when trying to delete it during clean. Would these worker processes normally be exited if there wasn't this classpath issue?

I suppose a workaround then for windows is that you can use a higher gradle version (with appropriate jmh plugin version) and then only call the jmh task from WSL. So far I haven't any conflicts going between the two except heavy memory use. Edit: If you close the WSL session without doing a gradlew stop the daemon and the worker process will hang around.

melix commented 5 years ago

There's no doubt to me that @CachableTask has nothing to do with locking. The fact we use the worker API now is maybe a problem, but the issue is most likely in JMH itself. Having to wait 30s is a strong indication of it.

jnellis commented 5 years ago

There's no amount of time that passes that it unlocks the jar on windows. It stays locked as long as those two worker daemon processes exist. Same on WSL(windows subsystem for linux; Ubuntu 16), except there is no lock to break(Linux don't care I guess), those worker process still sit idle waiting for IsolatedRunner and JmhBytecodeGeneratorRunnable to run again. There's gradle --statusto list the actual daemon PID's but there's no command for listing Worker processes that are sitting idle. The docs only say that they will be killed when 'resources are low' or the gradle daemon itself stops (default 3hours I think I read somewhere.)

grv87 commented 5 years ago

@melix, I confirm that my previous statement on 30 seconds timeout was wrong. It was false positive, either because my system become limited on resources, and some idle processes decided to stop themselves, or this was a side effect from File Leak Detector. Jar file is locked by Gradle worker daemon.

jnellis commented 5 years ago

@melix What neccessitates using IsolationMode.PROCESS in these two tasks? Since killing these two processes seems to release the file lock in windows maybe not having them run in their own process in the first place is a solution.

melix commented 5 years ago

Using this isolation mode makes sure that the task workers work in isolation from Gradle itself (that is to say in a different process). So using a different isolation mode would only make things worse.

To me the issue is JMH not closing a file. I didn't have time to investigate how to mitigate this, I will as soon as I can.

veita commented 5 years ago

@melix You should probably call close() on the amendedCL in JmhBytecodeGeneratorRunnable in order to close streams opened from file URLs.

Although there's no guaranty that this will fix the issue, it shouldn't make things worse. ;-)

jnellis commented 5 years ago

@veita This doesn't do anything from what I can tell. Locking still happens.

I fork the plugin and publish the latest version to mavenLocal and then link to that in a project to test and this is what I've tried.

@melix

To me the issue is JMH not closing a file.

No. You can short circuit the IsolationRunner class to do nothing and the benchmark.jar file as well as all associated dependencies in the user home ./gradle/cache are locked. It does seem to be directly related to gradle worker api and how its used in this scenario; creating an intermediate dependency then using it. Which seems like it would be a common idiom.

Furthermore, at least in WSL(ubuntu/oracle jdk8), zombie processes seem to be leftover. When a change in the source code or a project dependency invokes a new benchmark.jar being created; running that abandons the old daemon worker process that was still around and it spawns a new daemon worker process. So while its not blocking on WSL ubuntu it's also not allowing stale daemon processes to die off. The gradle docs say the default time deamon worker processes hang around until they are killed is three hours but that's not my experience; they will die only when gradle --stop or the parent process is killed. Someone on a linux machine should investigate for zombie processes after a benchmark.jar rebuild and run.

@grv87 said something smart in the #937 issue. Why keep a hot path just to run benchmarks when the benchmark is required to be regenerated anytime the benched source code is changed. So while maybe this isn't a bug with the gradle worker api, it's probably the wrong use case to use considering how jmh has to rebuild the benchmark each time.

To anyone that is having the same locking issues on windows and just has a simple project they would like to run jmh benchmarks until this is resolved I have made a crude drop in workaround