openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.21k stars 329 forks source link

On-demand source parsing #2668

Open knutwannheden opened 1 year ago

knutwannheden commented 1 year ago

Currently the Maven plugin (I assume also the Gradle plugin and possibly also the SaaS) parse / load all source files up front before starting to process any of the recipes. Quite often this is probably not necessary, since the active recipes will only apply to some of the sources (e.g. when in the SaaS a Maven recipe is tested with the dry-run option only the project pom.xml files are of interest). So I think it would be interesting if the sources could be demand-loaded in order to reduce both CPU and memory load.

If the SourceFile object could be proxied, while still containing the provenance markers and the source file's path, I think a lot of applicability tests could already be checked without ever having to parse the source.

Background

We run OpenRewrite as part of our Renovate jobs on GitLab. So whenever a downstream project upgrades from one version to another of the framework we develop, we collect the recipes applicable to this version upgrade and then apply them in order. For many of the minor versions the recipes only perform very specific code refactorings to adhere to some new best practices (e.g. stop using some deprecated method) and these recipes often also have a single-source applicability test, which checks for the presence of some type on the classpath (as the framework is split into many submodules and the recipe typically only affects code using a specific module). Sometimes the recipes also just refactor properties files.

The Maven goal often takes a long time to execute, even when the recipes only make changes to e.g. a properties file. As we do this for a lot of projects, the build jobs all together put quite a bit of load on the infrastructure and can deplete the build runners pool for some time.

sambsnyd commented 1 year ago

There are merits to this proposal. The way we adapt visitors might present challenges to the efficacy of this, potentially causing reification of lazily loaded recipes. I don't know that we will invest in this in the immediate future, but there is potential to reduce resource utilization.