Support parallel runs in Gradle multi-module projects

loetifuss commented 1 year ago

In a multi-module build I have applied the plugin to all subprojects. When I try to run the "rewriteRun" task on all subprojects, I'm getting the class loading errors below for each subproject. When I try to run the build without parallel processing ("--no-parallel") the tasks will run fine. However since there are hundreds of subprojects this is not a good solution.

Is the plugin intended to be used on multiple subprojects? I tried to apply the plugin on the root project but that will produce other issues (e.g. StackOverflowErrors).

5: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':subprojectA:rewriteRun'.
> loader org.openrewrite.gradle.RewriteClassLoader @37fc83c0 attempted duplicate class definition for com.fasterxml.jackson.module.kotlin.KotlinModule$Builder. (com.fasterxml.jackson.module.kotlin.KotlinModule$Builder is in unnamed module of loader org.openrewrite.gradle.RewriteClassLoader @37fc83c0, parent loader org.gradle.internal.classloader.VisitableURLClassLoader @4894f78d)

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
==============================================================================

timtebeek commented 1 year ago

hi @loetifuss ; Interesting use of rewrite you've got there! I don't know the specifics around our Gradle plugin too well (yet), but can imagine that it was not foreseen to be used in parallel on multiple subprojects, but not present on the root project.

It sounds like you've found a workaround for now by not running in parallel; does that make this issue feature request to support running in parallel?

Typically we do suggest to add the plugin to the root module, and then optionally increase the memory with -Dorg.gradle.jvmargs=-Xmx2G or more as needed. Could that be an option there too?

I don't suppose your project is open source is it? Otherwise we'd have another option through public.moderne.io.

loetifuss commented 1 year ago

Thanks for the update. It would indeed be great if the plugin could work on individual subprojects since this would leverage Gradle's built-in parallel processing of subprojects. My current goal is to automatically update gradle build dependencies in a large multimodule build using the "ChangeDependencyArtifactId" recipe. As far as I understand, when running the plugin on the root project, it tries to parse alle source files from all subprojects first. With a build containing millions of lines of code that's a very heavy task to perform. Or is there some way of filtering the input (currently I'm only interested in the Gradle build files)?

timtebeek commented 1 year ago

I've not used these options on Gradle projects myself yet, and from the documentation it's questionable if it will work to exclude Java sources, but we do have both exclusions and plainTextMasks options that you could try to run from the root whilst not parsing any sources as Java. 🤔

Just creatively looking for a quick workaround for you on your limited use-case; ideally we'd support more of this officially of course, but I imagine you mostly want something that works today, with an option on something better in the future.

sambsnyd commented 1 year ago

For such a large project it'll generally work better to separate parsing and recipe execution into separate steps. That is how our commercial saas product works - build plugin produces LST jars, saas runs the recipes on beefy hardware and issues pull requests. Perhaps that would be of interest to you @loetifuss ?

wakingrufus commented 1 year ago

typically, gradle plugins should be designed to be applied to the project within which they are operating (ie subproject) for better project encapsulation, which helps with parallel builds among other things. The Gradle Worker API is also ideal to use, as it handles parallelization of multiple units of work within a single subproejct for you. I have experience working with these things in my day job as well as in my work on https://github.com/JLLeitschuh/ktlint-gradle So I would be willing to help out the effort to bring this functionality to the Open Rewrite plugin

wakingrufus commented 1 year ago

I took a brief look at the code, and my first impression is: the plugin uses a custom classloader (RewriteClassLoader) which delegates to the parent classloader in some cases. the parent classloader is the gradle build's classloader which is shared across all projects, so, when running parallel builds, it causes race conditions that results in duplicate class definitions. Instead, the plugins should use the Gradle Worker API in classLoaderIsolation() mode i think. while this will hurt overall performance, the ability to run parallel builds will gain that back and more for many users.

shanman190 commented 1 year ago

So currently the rewrite plugin was only ever really designed to be applied to the root project and operate over all of the source code at once. There are aspects here that relate to being able to properly type attribute files between dependent projects. It may be possible to allow Gradle to perform more of the orchestration a little bit, but the rewrite plugin would have to make sure that all source sets have completed for a project before then performing the rewrite-specific tasks.

Now the other part that is going to be a larger kicker, is that event IsolationMode.NONE isn't actually sufficiently isolated enough for the purposes of rewrite. As an specific example, the worker's Groovy and Kotlin versions leak into the isolated boundary and for rewrite, we need to make sure that the project's Groovy and Kotlin versions are used instead for parsing. So the custom classloader isolation would have to remain if only for that aspect in particular.

The last part that will make this difficult, is that the rewrite plugin presently targets Gradle 4.0 and newer. The worker api first appears on the scene in Gradle 4.1 with the method that you call out not appearing until Gradle 5.6.

Personally, I'd enjoy seeing the ability to use several of the features available in newer Gradle versions, but just mentioning a few things about the current state presently.

wakingrufus commented 1 year ago

Can you explain more what "attribute files" means? If we just need the classes from other projects loaded, we can piggyback off the sub project's compile classpath to get that.

As for gradle version, even 5.x is EOL at this point, so maybe its time to make the jump. But i understand Moderne may have clients still on older versions, so that might be a tough call.

On another note, doing this refactor would probably allow the plugin to make use of task caching as well, which i notice it currently doesn't.

shanman190 commented 1 year ago

"attribute files" in terms of type attribution from a compiler standpoint. So if we have project A, B, C, and D with relationships as so:

A -> B A -> C B -> D C -> D

Then D needs to fully compile all source sets (main, test, etc), then it can do the rewrite parsing steps, then B and C fully compile all sources sets and do their rewrite steps, then finally A does it's work. Just the normal task graph stuff. The rewrite LSTs are type aware in so far as a given LST element points to it's immediate Java type reference which then in turn points to it's transitive type references. In order for the transitive nature to occur, previous elements must have had their types attributed and cached for the later stages to perform their work.

wakingrufus commented 1 year ago

"attribute files" in terms of type attribution from a compiler standpoint. So if we have project A, B, C, and D with relationships as so:

A -> B A -> C B -> D C -> D

Then D needs to fully compile all source sets (main, test, etc), then it can do the rewrite parsing steps, then B and C fully compile all sources sets and do their rewrite steps, then finally A does it's work. Just the normal task graph stuff. The rewrite LSTs are type aware in so far as a given LST element points to it's immediate Java type reference which then in turn points to it's transitive type references. In order for the transitive nature to occur, previous elements must have had their types attributed and cached for the later stages to perform their work.

Thank you for the explanation. And to do this, rewrite needs the AST/LST directly from sources, not the class files?

shanman190 commented 1 year ago

It's a bit of both. For files in the project the actual source code is used and those LSTs are combined with classes from the classpath and JDK itself to create the entire tree structure.

You can see a more detailed explanation of this here: https://docs.openrewrite.org/concepts-explanations/lossless-semantic-trees

sergeykad commented 10 months ago

Parallel execution supported since Gradle 1.2 (2012), so even if you want to support very old Gradle versions it should not be a problem.

openrewrite / rewrite-gradle-plugin

Support parallel runs in Gradle multi-module projects #183