openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.18k stars 330 forks source link

Performance issue with classpath scanning #4051

Open Bananeweizen opened 7 months ago

Bananeweizen commented 7 months ago

What problem are you trying to solve?

When running OpenRewrite on larger code bases, with many recipes enabled, I would normally assume it uses all the available cores while it runs. It doesn't. It rather produces a pattern of short spikes (of almost full CPU usage) followed by longer parts of ~30% CPU usage on my multi-core Windows machine.

Attaching a profiler makes me think this is entirely related to the classpath scanning. What I can see

I therefore believe that at least on Windows the runtime is dominated by ongoing repetitions of

Describe the solution you'd like

This could potentially be improved by splitting the scanning and the analysis into separate runnables executed by 2 different threadpools (where a finished scanning thread would start a recipe thread). In theory this would lead to the classpath scanning running almost all the time (limited by the file system performance and the antivirus stuff), and refactorings being triggered as often as possible in between.

This would still not lead to full CPU usage probably, but at least to higher CPU usage.

Have you considered any alternatives or workarounds?

I have not yet understood whether or not the classpath scanning could eventually be reduced as such. That would of course be an even better solution.

Are you interested in contributing this feature to OpenRewrite?

Maybe. Not sure if I can actually rework the code in this manner. Currently I see this issue more as a discussion about potential improvements.

timtebeek commented 7 months ago

Thanks for looking into this & proposing an area of improvement. I'm going to tag @knutwannheden on this one given his earlier work on performance improvements. Definitely interesting, especially with an eye towards more integrations we're seeing and doing.

knutwannheden commented 7 months ago

IIRC, we use classpath scanning for the following things: When loading recipes, when setting up the JavaSourceSet marker (during parsing phase) and when executing JavaTemplates, which load jars from classpath resources. I may have missed some other use here.

Since ClassGraph uses threads, it is in the profiler result hard to tell where the calls are coming from. It would however be good to know what is causing the most problems for you. Likely we need to address the various uses in different ways.