Open vlsi opened 5 months ago
Thanks for integrating OpenRewrite at apache/jmeter as well; neat to see how you're applying the tool. I've gone ahead and enabled ingesting apache/jmeter to the Moderne Platform as well, such that you can also use that to run recipes. It might take a bit of time to show up there, as I suspect it suffers from the same OOM issue, but when it does you'll find it in the Apache organization in https://app.moderne.io
I also like your suggestion to swap out ClassGraph, as indeed those OOM have come up before. Colleagues might be the better judge of that than I am at this point on the relative merits; perhaps @knutwannheden as a first judge on the viability?
@vlsi I have been thinking the same thing for a while now. Here some comments:
Class#forName()
, but note that we also have use cases (like the SaaS or CLI) where we indeed want to find all recipes.While I have used Jandex extensively while working on Quarkus projects, I am at this point unsure how it actually operates and is different from ClassGraph. But I don't think it is out of the question to replace ClassGraph with Jandex, if that solves our issues.
In the long term I would however also favor a ServiceLoader
like approach. I was thinking that we would have to extend our build plugins with a task / goal which generate some resource file (with a known name). This file would then contain all details about recipes (declarative or not) present in the jar. In the context of this task / goal we could of course continue to use ClassGraph, as this would be a development time operation.
I am at this point unsure how it actually operates and is different from ClassGraph
The key difference is that one can build "annotation index" at the build time, and then it is much faster to load and process than scanning the files repeatedly.
In other words, as you build openrewrite-testing-frameworks.jar
you could integrate jandex index
there. Then it would be faster to query than parsing the classes.
However, ServiceLoader
seems quite reasonable for the recipe discovery.
One more point for using ServiceLoader
is that is is more-or-less supported by "dependency shading / uberjar" resource transformers. Everybody knows how to merge META-INF/services
files. If you use jandex
, the ones who shade openrewrite would have to suffer and invent their own "jandex merge tool".
auto-service might help creating META-INF/services
files at the cost of annotation processing at the build time.
To load an individual recipe we could indeed do something like Class#forName(), but note that we also have use cases (like the SaaS or CLI) where we indeed want to find all recipes
ServiceLoader
would support that as well.
The key idea behind ServiceLoader
being fast is it uses a well-known filename like META-INF/services/org.openrewrite.Recipe
. ClassLoader.getResources("META-INF/services/org.openrewrite.Recipe")
would be much faster than analyzing every zip entry, parsing the classfile and verifying if the class extends Recipe
or not.
The same should work for SaaS and CLI.
Of course, it would require adding META-INF/services
to the jars with recipes. You might implement a fallback by adding a jar manifest entry that marks "this jar includes only recipes that have been migrated to services".
I've made the similar change in JMeter a year ago: https://github.com/apache/jmeter/pull/5885
I think I would like the META-INF/services files to be transparent to the recipe author by having the files generated at build-time by our build plugin. What do you think about that?
generated at build-time by our build plugin
That is one of the options, however, how would you implement that? Imagine someone creating a recipe, pardon me, in Kotlin ;)
How would you add META-INF/services
transparently? Would you use kapt or https://github.com/ZacSweers/auto-service-ksp ?
I think it should be good enough to divert users to autoservice and/or manual META-INF/services creation samples.
generated at build-time by our build plugin
In my experience, it is much more useful to have a validator, however, it is a different topic. For instance, if the user produces a jar with openrewrite recipes and styles, the validator could parse the yml and tell the user at the build time that the file is invalid, or it refers an non-existing style and so on.
The ultimate solution would be an openrewrite recipe that adds @AutoService(Recipe.class)
for the recipes :)
I was thinking that the build plugin would use ClassGraph (or similar) the way it currently does to scan the Java class files and YAML files in the build output folder and generate some kind of manifest into the META-INF folder which then gets added to the jar. So it would not necessarily have to be a Java service.
It almost seems like producing and adding a Jandex index to the recipe jar would be the simplest solution as a first step. That would slightly increase the build time and recipe jar size, but would allow getting rid of the expensive scanning at runtime.
So it would not necessarily have to be a Java service.
Java service is something that has zero-arg constructor which you require for the recipe anyway.
Everything else (jandex indices or whatever) would require custom processing which might make it harder for the users, especially when it comes to shaded jars.
Adding a small note here that some of the wider community of OpenRewrite recipe producers don't use the same Gradle build plugins, or even use Maven. So might be good to keep those in mind as well with any changes we produce here, that we ideally don't enforce any changes there.
I have not double-checked that, however, I think the root cause of ClassGraph
-based scanning slowness is caused by the fact that ClassGraph
has to process many more classes than needed: it scans all the classes in the classpath, including Gradle and its dependencies.
In other words, a low-hanging fruit here may be to list the jars that OpenRewrite needs to scan.
For instance, something like .overrideClasspath
might help.
https://github.com/classgraph/classgraph/wiki/ClassGraph-Constructor-API#configuring-the-classgraph-instance
.overrideClasspath(String|URL... classpath)
allows you to specify which paths to scan, overriding the classpath and the module path.
For instance, scanYamlResources
takes ~1-1.2 sec for me (remember that ClassGraph
is multithreaded), while a manual implementation that iterates over rewrite jars and looks for yml and yaml files takes ~140ms to load the yaml resources (~40ms to locate + 100ms to load).
Note: there is already .ignoreParentClassloader()
in some of ClasspathScanningLoader
methods, so the scan durations might be better for the current rewrite-gradle-plugin
:
What problem are you trying to solve?
org.openrewrite.config.Environment.Builder#scanClassLoader
causesOutOfMemoryError
It is hard to display in heap dump analysis, however, scanning Apache JMeter classpath triggers OOM with
-Xmx2500m
heap.Describe the solution you'd like
OpenRewrite
should either load recipes on demand or it should useServiceLoader
to locate services. It would avoid spending time on classpath scan.Have you considered any alternatives or workarounds?
Can OpenRewrite search the recipe when user requests the exact recipe name?
For instance:
Why throwing
Recipes not found
exception? Can it just requestClass.forName("org.openrewrite.java.testing.junit5.JUnit5BestPractices")
and orclassLoader.getResources("META-INF/openrewrite/recipes.yml")
? Then it won't need to scan all the classpath jars for every execution.An alternative option could be adding https://github.com/smallrye/jandex annotation index, however, it would be an overkill.
Additional context
I am integrating OpenRewrite to Apache JMeter, and having OR trigger OOM is painful: https://github.com/apache/jmeter/pull/6217