openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.27k stars 340 forks source link

Gradle parser fails with extra parens around an expression #4615

Open mccartney opened 1 month ago

mccartney commented 1 month ago

What is the smallest, simplest way to reproduce the problem?

    @Test
    void parensAroundAnExpression() {
        rewriteRun(
          buildGradle(
            """
              plugins {
                  id 'java-library'
              }
              def version = (rootProject.jobName.startsWith('a')) ? "latest.release" : "3.0"
              """
          )
        );
    }

What is the full stack trace of any errors you encountered?

java.lang.AssertionError: Source file was parsed into an LST that contains non-whitespace characters in its whitespace. This is indicative of a bug in the parser. 
plugins {
    id 'java-library'
}
def version = ~~(non-whitespace)~~>(<~~rootProject.jobName.startsWith('a')~~(non-whitespace)~~>) <~~? "latest.release" : "3.0"
    at org.openrewrite.test.RewriteTest.rewriteRun(RewriteTest.java:323)
    at org.openrewrite.test.RewriteTest.rewriteRun(RewriteTest.java:132)
    at org.openrewrite.test.RewriteTest.rewriteRun(RewriteTest.java:127)
    at org.openrewrite.gradle.GradleParserTest.parensAroundAnExpression(GradleParserTest.java:199)

Context

It works fine when removing the admittedly excessive parentheses.

shanman190 commented 1 month ago

@mccartney, what version of OpenRewrite are you using? I could have sworn that this was fixed a while ago in the Groovy parser.

mccartney commented 1 month ago

@shanman190 I have just reproduced the test failure with current main, i.e. 36efd20f640902d87926945e5748bb1f4039b7a6

Would it help if I submitted a PR with this test and @ExpectedToFail?

knutwannheden commented 3 weeks ago

Looks like the Groovy parser version we are using is somehow not adding that _INSIDE_PARENTHESES_LEVEL metadata around the condition. This is what we rely on to do the parentheses parsing correctly.

@shanman190 AFAIK Gradle uses Groovy 3 in build scripts and I assume it will stay that way. I think I remember that Groovy 4 should be mostly backwards compatible with Groovy 3 (some deprecated API was removed). Would we be able to use the Groovy 4 parser?

shanman190 commented 3 weeks ago

@knutwannheden, so the way that rewrite-gradle is organized is very similar to all of the other language bindings in that the user provides the compiler version. For rewrite-gradle specifically, this is determined by the Gradle distribution version. In practice, we're pulling the buildscript and settings classpaths and providing that to the GradleParser which is then transitively applying it to the GroovyParser.

Gradle <7.0 are bringing Groovy 2.x while Gradle 7.0+ are bringing Groovy 3.x. Furthermore, Gradle has no intent to upgrade to Groovy 4.x given the effort they've put on the Kotlin DSL (and furthermore Declarative DSL). For more specific information, you can take a quick peek through https://github.com/openrewrite/rewrite/issues/3406.

mccartney commented 3 weeks ago

I think Knut's idea was to use a Groovy 4.x parser against Gradle scripts, even knowing that Gradle uses Groovy 3.x.

shanman190 commented 3 weeks ago

@mccartney, yep, I know. So there's two parts to this which Knut would gather from my response, but I'll expand it out a bit more.

The Groovy ASTs are allowed to contain breaking changes even in patch versions. Groovy doesn't expect for users to be working with things at that level and the integration expectations are that the AST and runtime are paired directly together during the compilation phase of the project as it's compiling locally. Now in practice, there's very rarely breaking changes, but they are still there.

When you take this a step further, Gradle is providing a Groovy distribution natively and then doing classloader isolation to further insulate its own Groovy version from the version that the user is using for their project should they be choosing to use Groovy source in the first place.

In each of these places, OpenRewrite must adapt to the consumer's environment to make sure that we are using the versions that they are. As an example of this in practice with the Java language binding via the use of rewrite-java (base module), then rewrite-java-8, rewrite-java-11, etc. These version specific modules have breaking changes between them from a compiler AST standpoint as well, so what must happen as OpenRewrite is compiling the source is reflectively loading the correct Java specific version to then use for parsing that source set. It's been something that's been considered for rewrite-groovy as well, but hasn't been taken on yet as the 3.x AST code has been compatible enough to parse both Groovy 2.x and 3.x as needed. Joan (in the linked issue) tried to adapt rewrite-groovy to Groovy 4.x's AST, but there are various parts that conflict and would need to be resolved, assuming that's even possible to do without introducing the layer of abstraction similar to the Java language binding. Then you combine this with the fact that OpenRewrite chooses to not bring the runtime version, but instead allow the user to bring the appropriate version we would just come full circle back to where we are presently in that the current code path for parentheses is using a newer mechanism/feature that isn't available in the older Groovy versions.

EDIT: Looking down the hypothetical path for a moment, if we did force Groovy 4 onto the parser's classpath, we would absolutely need to make sure that the Gradle Groovy version cannot leak into the classloader. This is further complicated by the initial attempts at making the buildscript and settings classpaths available to the parser for type attribution since Groovy would exist in those classpaths to support Gradle itself. Even if we managed to extricate Groovy out of the classpath, it's possible that various plugins that exist on the classpath as well are also dependent upon the Groovy version. So in short, this path is... complicated.

knutwannheden commented 3 weeks ago

Thanks for the clarifications @shanman190 . For the Python parser I had a very similar issue, as the parentheses are fully abstracted away in the AST. So instead the parser currently manually serially advances the cursor through the source file, without any position information from the AST. Maybe that is what we have to do for Groovy too, given that the AST is so unreliable.

shanman190 commented 3 weeks ago

Yeah, most of the GroovyParser is manual advancement already, so that's probably not the worst. I think as long as we kept track of the parentheses encountered (stack or int), then that should satisfy the gap that we've got.

knutwannheden commented 3 weeks ago

Yes, that is probably the way we need to handle this. I think I remember we also have some other things missing like method references.

shanman190 commented 3 weeks ago

Yeah, method references in both Groovy 2.x/3.x (.&method) as well as the 4.x (::method), I think both have historically been absent.