sbt / zinc

Scala incremental compiler library, used by sbt and other build tools
Apache License 2.0
333 stars 118 forks source link

Overcompilation for dependencies of classes using (not defining) macros #1333

Closed jackkoenig closed 3 months ago

jackkoenig commented 7 months ago

steps

Apply the following patch to the zinc git repository:

diff --git a/zinc/src/test/scala/sbt/inc/IncrementalCompilerSpec.scala b/zinc/src/test/scala/sbt/inc/IncrementalCompilerSpec.scala
index dc7a552fa..25a2fdbad 100644
--- a/zinc/src/test/scala/sbt/inc/IncrementalCompilerSpec.scala
+++ b/zinc/src/test/scala/sbt/inc/IncrementalCompilerSpec.scala
@@ -147,6 +147,44 @@ class IncrementalCompilerSpec extends BaseCompilerSpec {
       }
   }

+  it should "not trigger recompile dependencies of a macro use when the macro use recompiles" in withTmpDir {
+    tmp =>
+      def compilerSetupHelper(ps: ProjectSetup) = {
+        val setup1 = ps.copy(scalacOptions = ps.scalacOptions :+ "-language:experimental.macros")
+        setup1.createCompiler()
+      }
+      val p1 = VirtualSubproject(tmp.toPath / "p1")
+      val p2 = VirtualSubproject(tmp.toPath / "p2").dependsOn(p1)
+      val c1 = compilerSetupHelper(p1.setup)
+      val c2 = compilerSetupHelper(p2.setup)
+      try {
+        val s1 = "object A { def f(c: scala.reflect.macros.blackbox.Context) = c.literalUnit }"
+        val s2 = "object B { def x: Int = 3 }"
+        val s2b = "object B { def x: Int = 4 }"
+        val s3 =
+          """|object C {
+             |  def f: Unit = macro A.f
+             |  def g: Int = B.x + 2
+             |}""".stripMargin
+        val s4 = "object D { def f: Int = C.g - 2 }"
+
+        val f1 = StringVirtualFile("A.scala", s1)
+        val f2 = StringVirtualFile("B.scala", s2)
+        val f2b = StringVirtualFile("B.scala", s2b)
+        val f3 = StringVirtualFile("C.scala", s3)
+        val f4 = StringVirtualFile("D.scala", s4)
+
+        c1.compile(f1)
+        c2.compile(f2, f3, f4)
+
+        val result = c2.compile(f2b, f3, f4)
+        assert(lastClasses(result.analysis.asInstanceOf[Analysis]) == Set("B"))
+      } finally {
+        c1.close()
+        c2.close()
+      }
+  }
+
   it should "track dependencies from nested inner Java classes" in withTmpDir { tmp =>
     val project = VirtualSubproject(tmp.toPath / "p1")
     val comp = project.setup.createCompiler()

Then run the test:

$ sbt
> zinc / testOnly sbt.inc.IncrementalCompilerSpec -- -z "not trigger recompile dependencies of a macro use when the macro use recompiles"

problem

The test will fail with:

[info] - should not trigger recompile dependencies of a macro use when the macro use recompiles *** FAILED ***                           
[info]   Set("D") did not equal Set("B") (IncrementalCompilerSpec.scala:196)
[info]   Analysis:
[info]   Set(missingInLeft: ["B"], missingInRight: ["D"])

The expected value of just B being recompiled is the ideal result with no overcompilation.

expectation

If you run this test on v1.9.6 or v1.10.0-M1, it passes. It fails on v1.10.0-M2 and head of develop.

notes

I bisected this down to @dwijnand's PR https://github.com/sbt/zinc/pull/1282. More specifically, commit c7bcc0659c1c4f47826355c6911371b494928ec0.

What is a little bit weird, is if you run this test on that commit, you get that B, C, and D are all recompiled which is what I would call the "overcompilation" result--it's at least correct. Starting with commit 55fa0917535a29f108385bfab495dc7033c5f9e7, it started giving the current result, D, which I think is wrong--obviously B needs to be recompiled since it changed.

Friendseeker commented 7 months ago

Thanks for the bug report!

it started giving the current result, D, which I think is wrong--obviously B needs to be recompiled since it changed.

My understanding might be incorrect, but I think lastClasses only returns compilations in the last cycle, so it can be the case that B is recompiled in one of the previous cycles.


As for over compilation issue, addressing this will require some sophisticated analysis like https://github.com/com-lihaoyi/mill/pull/2417. Despite D actually do not depend on any macros, currently there's no known easy heuristic to let Zinc recognize the fact.

dwijnand commented 7 months ago

My understanding might be incorrect, but I think lastClasses only returns compilations in the last cycle, so it can be the case that B is recompiled in one of the previous cycles.

Indeed.

Despite D actually do not depend on any macros, currently there's no known easy heuristic to let Zinc recognize the fact.

The change in my PR was intended to fix any changes in the macro implementation, where here the change is in the macro (I'm not even sure what we call it) definition site, or rather in the same class as the macro def. So perhaps we have the toggles to distinguish that, but I'm not certain.

jackkoenig commented 7 months ago

My understanding might be incorrect, but I think lastClasses only returns compilations in the last cycle, so it can be the case that B is recompiled in one of the previous cycles.

Indeed.

Is there a different utility for getting all classes compiled in the compiler run?

Despite D actually do not depend on any macros, currently there's no known easy heuristic to let Zinc recognize the fact.

The change in my PR was intended to fix any changes in the macro implementation, where here the change is in the macro (I'm not even sure what we call it) definition site, or rather in the same class as the macro def. So perhaps we have the toggles to distinguish that, but I'm not certain.

When I was first minimizing this issue, I was thinking that this change implied that any use of macros to derive typeclasses (eg. for serialization) would be impacted, but I think I was wrong. This is how I now understand the new functionality:

Given:

If I change anything in Upstream.scala, eg. adding a println.

  1. HasMacro.scala and everything that depends on anything in that file will need to be recompiled.
  2. CallsMacro.scala may need to be recompiled, but downstream dependencies will NOT be recompiled.

Am I understanding correctly?

Thanks!

Friendseeker commented 7 months ago

Is there a different utility for getting all classes compiled in the compiler run?

Maybe classes?

  1. HasMacro.scala and everything that depends on anything in that file will need to be recompiled.
  2. CallsMacro.scala may need to be recompiled, but downstream dependencies will NOT be recompiled.

I think your understanding is correct. For HasMacro.scala, zinc assumes upstream change cause macro def behaviours to change, hence zinc invalidates HasMacro.scala and all classes depending on it. (to be pedantic zinc invalidates the specific class in HasMacro.scala containing the macro defs, but this subtle detail does not affect the discussion).

As for CallsMacro.scala, downstream dependencies will not be recompiled since recompiling CallsMacro.scala itself is sufficient to ensure correctness.


By the way, is it common in practice to have macro def and macro implementation in separate classes? This information will help in deciding severity of the issue.

jackkoenig commented 5 months ago

Took a while, but finally got around to pulling all of our uses of the macro keyword upstream (still not sure what to call these, macro invocations? Dale said "definition site" but that really sounds to me like it would be the same thing as the macro implementation 🤷‍♂️). In doing so, I confirmed that the incremental compilation "problem" I described is fixed.

As such I would argue that this is working as intended, but I do suspect that a lot of users of macros will be bitten by the same issue so maybe the fix is some way to document the best practices here. We basically took any classes that had def macroInvocation = macro macroImplementation and pulled that method into a trait in a new file with minimal dependencies. Which leads me to:

By the way, is it common in practice to have macro def and macro implementation in separate classes? This information will help in deciding severity of the issue.

You're required by Scalac to have the macro implementation (def macroImplementation(c: Context): c.Tree = ???) in a separate compilation unit from the invocation (def macroInvocation = macro macroImplementation), but (in my experience at least) it seems pretty common to have invocations interspersed with regular code. Maybe my code just has too many macros and this won't affect most Zinc users. That being said, I still think documenting the best practices here with some examples would go a long way!

In any case, I think it's fine to close this issue, but will defer to the maintainers.

Many thanks for your explanations!

jackkoenig commented 5 months ago

You're required by Scalac to have the macro implementation (def macroImplementation(c: Context): c.Tree = ???) in a separate compilation unit from the invocation (def macroInvocation = macro macroImplementation),

Correction, apparently this is not the case. I thought it was for the entirety of the ~10 years I've been writing Scala lol. It seems that calls of macroInvocation need to be in a different compilation unit from macroImplementation. These calls can, but generally probably should not, be in the same compilation unit as def macroInvocation. Interesting!