openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.02k stars 299 forks source link

Enable Maven parallelism via `.mvn/maven.config` file #4177

Open DPUkyle opened 2 months ago

DPUkyle commented 2 months ago

What problem are you trying to solve?

The vast majority of Maven builds I've encountered are single-threaded - quite a waste of processing power in today's multi-core world. While not every multi-module Maven build will be compatible with parallelism, those that are are typically provide CLI arguments on every invocation like -T 1C or --threads 0.5C.

There is no way to provide a parallelism setting in a pom.xml file.

I recently discovered that Maven 3.3.1+ has a facility to persist default CLI arguments in a file called .mvn/maven.config (docs). (thanks @bdemers)

What precondition(s) should be checked before applying this recipe?

  1. Maven version is 3.3.1+
  2. For MVP, does .mvn/maven.config file already exist?
  3. For future releases, parameterize the parallelism factor in the recipe. Defaulting to 1C (one thread per CPU core) is a reasonable default.
  4. For future releases, parse .mvn/maven.config if exists already and add/edit -T or --threads argument.
  5. Expanding beyond that, maybe provide a general-purpose Maven CLI argument persistence recipe.

Describe the situation before applying the recipe

.mvn/maven.config does not exist

Describe the situation after applying the recipe

.mvn/maven.config is created with contents:

-T 1C

Have you considered any alternatives or workarounds?

Not really. Other than a bespoke shell script to invoke Maven, .mvn/maven.config is the only method I know of to "natively" provide CLI argument defaults.

Any additional context

If I were a consumer of this recipe, I'd want to ensure that enabling parallelism doesn't break my build and that the results before/after are the same. I'm not sure how this could be verified.

Are you interested in contributing this recipe to OpenRewrite?

Absolutely.

timtebeek commented 2 months ago

Hi @DPUkyle ; thanks for the suggestion & detailed outline! Indeed looks like a valuable addition to help speed up builds. I think we should already have enough information to write a version, and expand on that once available.

In terms of implementation I think a ScanningRecipe would be best: That gives you an API to evaluate all files before making any changes (to for instance spot disqualifiers in known-not-parallel-safe-plugins), a method to generate new files, and a visitor for new and existing files to add options as needed/desired. You can see an example scanning recipe implemented in our rewrite-recipe-starter; the new recipe described here would be a fine contribution to openrewrite/rewrite.

Great to see you're interested in contributing this recipe! Feel free to ask any questions here, or reach out through our OSS Slack, or the shared Slack channel we have with Gradle.

shanman190 commented 2 months ago

Just to tack on another option for this as well, the Maven daemon uses -T 1C by default in its implementation. So another way to achieve the parallelism would be to swap from the standard Maven distribution or wrapper to the Maven daemon.

DPUkyle commented 2 months ago

@shanman190 I like the mvnd option also (docs). I didn't realize that it's a native executable, which is great.

That leads me to another idea for a recipe: adding Java toolchain information to a Maven project. I very strongly feel this is a best-practice for any project, and should be a prerequisite to mvnd use.

shanman190 commented 2 months ago

Let's spin off Java toolchains for Maven to a separate issue. If you're up for providing the features for one or both of them, we can always help on a PR or in the OpenRewrite Slack workspace.

DPUkyle commented 2 months ago

Let's spin off Java toolchains for Maven to a separate issue. If you're up for providing the features for one or both of them, we can always help on a PR or in the OpenRewrite Slack workspace.

I created #4185

Bananeweizen commented 2 months ago

Please be aware that enabling parallelism in Maven builds will almost always break the build. That's because most developers are lazy as hell, and just build modules depending on each other in an aggregator child module order which is fine. E.g. you build modules A, B, C, where C depends on B, and B depends on A, without ever specifying those dependencies explicitly. In a non-parallel build that will always work, in a parallel build that will often break. I've switched a lot of commercial and open source projects from non-parallel to parallel, and it basically always required additional changes to create a "partial graph" of build nodes with explicit dependencies. Therefore I'm not sure whether creating a recipe for enabling parallelism is actually worth it, as I expect most users to see their build fail afterwards and to complain about the recipe.

That being said, if someone is going to implement this, then the recipe should also switch the maven builder from the built-in builder to the takari smart builder, which does more aggressive parallelization than the maven standard builder, and therefore leads to shorter build times than the standard builder.