Open OndrejSpanel opened 2 months ago
Did you start seeing this behavior with some specific version sbt (if so could you identify the first version it does this)?
Do you use macros, lots of implicits, or package objects? Normally Zinc won't invalidate if the change is limited to a local variable (a variable inside of a function or a method).
The complete build log obtained using
Level.Debug
has over 50 000 lines.
Would you like to share the log? There's still a chance that we can extract something useful out of the log. Ideally also share the entire scala project.
The log hopefully contains reason for invalidation somewhere. This is indeed a peculiar issue as changes such as “after I renamed a local variable from add to toAdd” should not have any impact at all. I still need to double check but I believe Zinc does not grant local variable in a function a name, so none of the name hashes should change.
Also does the log always start with
[debug] compilation cycle 1 [info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
If that is the case, chances are the 2 files are always the same 2 files.
If that is the case, chances are the 2 files are always the same 2 files.
Yes, it always starts this way. The issue is 100% reproducible. For me the logs contain too much information, it is hard for me to digest it and to learn what is going on from it.
I can share you the log privately, if you suggest a suitable means to do so. As for the project, I would be very reluctant to share it, unless there is some NDA about it.
If that is the case, chances are the 2 files are always the same 2 files.
Yes, it always starts this way. The issue is 100% reproducible. For me the logs contain too much information, it is hard for me to digest it and to learn what is going on from it.
I can share you the log privately, if you suggest a suitable means to do so. As for the project, I would be very reluctant to share it, unless there is some NDA about it.
I see. For the log, if you use discord my discord ID is vertex1003. You can send a google drive link containing the log via discord DM, and after I downloaded the log I will reply to you and you can then immediately disable the google drive link.
If you use other messaging app, tell me the app name and your ID and I will register an account and add you there.
Btw may I share segment of the log relevant to the issue to other Scala tooling contributors for sake of debugging this issue?
@OndrejSpanel Could you try:
Compile / incOptions ~= { _.withTransitiveSteps(10) }
I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.
@OndrejSpanel Could you try:
Compile / incOptions ~= { _.withTransitiveSteps(10) }
I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.
Going slightly offtopic but should we consider raising the default transitive step to something higher than 3? I was reading source code of Intellij Scala Plugin and noticed their Scala Language Server sets transitive step to 5.
At the rate that it expanded from 2 to 36 files, I am still guessing it's something to do with macro invalidation, but if this fixes the problem maybe we could consider upping to 6.
I have added Compile / incOptions ~= { _.withTransitiveStep(10) }
. With this setting I get:
[debug] compilation cycle 1
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 15 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
This is much better, but still quite a lot for a renamed local variable.
For the log, if you use discord
I have the logs prepared, discord friendship request sent (username ospanel).
With { _.withTransitiveStep(30) } the logs are the same. For repro I rename the same variable between add
and toAdd
. Sometimes I get similar, but a different logs:
[debug] compilation cycle 1
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 36 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 50 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
As for macros, I use several libraries which use them quite extensively, namely borer, circe, quicklens and airframe.surface.
With { _.withTransitiveStep(10).withRelationsDebug(true).withApiDebug(true) }
there is:
Detected API change because net.gamatron.json.Schema contains a macro definition
This object uses airframe.surface, namely my "light" fork of it (https://github.com/OpenGrabeso/light-surface/).
I am trying to create a repro using surface
. Can someone help me what to look for, when I see this in the SBT debug output:
[debug] [diff] Detected API change because net.gamatron.json.Schema contains a macro definition. [debug] [inv] [debug] [inv] Changes: [debug] [inv] API Changes: Set(APIChangeDueToMacroDefinition(net.gamatron.json.Schema)) [debug] Invalidating (transitively) by inheritance from net.gamatron.json.Schema... [debug] Initial set of included nodes: net.gamatron.json.Schema [debug] Invalidated by transitive inheritance dependency: Set(net.gamatron.json.Schema) [debug] The following member ref dependencies of net.gamatron.json.Schema are invalidated: [debug] aaa.bbb.ccc.Ccccc [debug] aaa.bbb.ccc.Dddd [debug] aaa.bbb.ccc.ddd.EEEE
I am trying to include various constructs those classes use from Schema, but do not see the issue in my repro so far.
Some things Schema includes, which the classes mentioned as "member ref dependencies" use:
// macro declaration
def enumerateSubclassesImpl[T: Type](using Quotes): Expr[List[Surface]] = ???
inline def enumerateSubclasses[T]: List[Surface] = ${ enumerateSubclassesImpl[T] }
inline def listDerivedClasses[T]: List[TypeDesc] = enumerateSubclasses[T].map(TypeDesc.apply(_))
// annotation declaration
type hiddenType = hidden
// type declarations
case class TypeDesc(surface: Surface)
What are "member ref dependencies" and when are they invalidated? I am not even sure why is net.gamatron.json.Schema
invalidated on my change, as it does not seem to be used in the file I am changing.
Almost all classes listed in the first step as "by member reference" use listDerivedClasses
macro, but not all - some are just simple sealed trait
, with no references to Schema
at all.
What are "member ref dependencies" and when are they invalidated?
Member ref dependencies are basically direct reference from a class to another.
class A
class B {
val a = new A;
}
Then B has a member ref dependency to A as B refers to A.
As for why they are invalidated, there's lots of ways they can be invalidated. One common way is that if a class has macros (either contains a macro call site or macro implementation), then all of its member ref dependents are invalidated, which do appear like what is taking place judging by the appearance of APIChangeDueToMacroDefinition
in the log.
I am not even sure why is net.gamatron.json.Schema invalidated on my change, as it does not seem to be used in the file I am changing.
I am looking into it. The debug log should contain the reason but the log is a bit hard to read... Which I guess is an area Zinc can work on.
If you think there is any experiment I could run on my project or any other information you want, just ask. I am very glad you care about the issue and work on solving it. Thanks a lot.
I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.
Going slightly offtopic but should we consider raising the default transitive step to something higher than 3? I was reading source code of Intellij Scala Plugin and noticed their Scala Language Server sets transitive step to 5.
I confirm _.withTransitiveStep(5)
is enough for my project to avoid ~500 files being compiled in the compilation cycle 3, I see only 15-50, same as with _.withTransitiveStep(10)
.
One more observation: I do not even have to rename a local variable. Just editing a comment in (almost) any source is enough.
I see very similar messages as before, only there is 1 Scala source compiled in the compilation cycle 1, not 2:
[debug] compilation cycle 1
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 15 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] Change APIChangeDueToMacroDefinition(net.gamatron.json.Schema) invalidates 31 classes due to The net.gamatron.json.Schema source file declares a macro.
[debug] > by transitive inheritance: Set(net.gamatron.json.Schema)
If the package object gets shaken, it's game over. So my suggestion would be to split the source code as much possible for anything that might invalidate the package object, or maybe just avoid package object altogether. I think Scala 3 was trying to deprecate it at some point? - https://docs.scala-lang.org/scala3/reference/dropped-features/package-objects.html
Thanks for the tip. I will check what package objects could be involved, I certainly have some present. (Or if the logs tell more about what package object in particular is involved, it will certainly help me). And to be sure I understand: by "shaked" you mean "invalidated"? Still I do not understand why changing a comment or a local variable name shakes anything at all.
When I replaced my most prominent package object
by expanding it to top-level definitions instead, the result is even worse, on a comment change I get:
[debug] compilation cycle 1
[info] compiling 19 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 46 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 56 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
Thanks for the tip. I will check what package objects could be involved, I certainly have some present. (Or if the logs tell more about what package object in particular is involved, it will certainly help me).
When changing a local variable name in a file contains macro (either implementation or call site), zinc performs very aggressive invalidation since macro expands at callsite.
I still need to think more about what is going on. I believe that Zinc's macro invalidation strategy is that a when a source file containing macro is invalidated, all downstream files (all member ref dependents of the macro containing file) would be invalidated. But I would assume that net.gamatron.json.Schema
is an upstream class. It is probably not supposed to be invalidated yet it is somehow invalidated...
And to be sure I understand: by "shaked" you mean "invalidated"? Still I do not understand why changing a comment or a local variable name shakes anything at all.
Shake means invalidated.
to split the source code as much possible for anything that might invalidate the package object,
Can you give me some examples of what to avoid?
upstream class.
Downstream / upstream - I am afraid I am not sure yet what it means. In this case Schema
is a utility class containing a macro and other things. It should not depend on anything else other than surface library, I think.
In this repo I try to replicate how is it used in the real project - https://github.com/OndrejSpanel/incremental-repro - so far I was not able to reproduce the issue, though. I will continue to add more code, but any hints on what might be the problematic construct would be helpful.
Downstream / upstream - I am afraid I am not sure yet what it means.
Upstream means the class is relatively higher in the dependency tree. Like if class A depends on class B, then B is upstream relative to A.
In this case Schema is a utility class containing a macro and other things. It should not depend on anything else other than surface library, I think.
Thanks for the info! This is really important information that helps to narrow down possibilities.
In this repo I try to replicate how is it used in the real project - https://github.com/OndrejSpanel/incremental-repro - so far I was not able to reproduce the issue, though. I will continue to add more code, but any hints on what might be the problematic construct would be helpful.
I will try to see if I can talk to Scala 3 compiler devs. I don't have any idea on what might be the problematic construct but they might have an idea on what constructs may cause the issue.
I have created a standalone repro - https://github.com/OndrejSpanel/IncRepro. It is about 15 scala source files, referencing each other in various ways, including opaque types and exports, sometimes using unused imports, with some package objects present.
To see the issue:
compile
object WC
compile
againThe output I see after the second compile:
[IJ]compile
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[success] Total time: 1 s, completed 19. 9. 2024 1:18:44
(I hope it will be useful, I spent more than 10 hours reducing it to this shape.)
If there is any progress with this issue, please let me know. Even if it is not resolved, just knowing what kinds of constructs trigger this would immensely help me, as I could refactor my sources accordingly. The long compile times on trivial changes are killing me.
If there is any progress with this issue, please let me know. Even if it is not resolved, just knowing what kinds of constructs trigger this would immensely help me, as I could refactor my sources accordingly. The long compile times on trivial changes are killing me.
Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.
Didn't find the root cause yet but did some narrowing down the issue and found out that
has incorrect result. Just by changing a comment, removedProducts
returned 17 entries causing mass invalidation.
More specifically previous.allProducts.toVector
contains 43 entries, and for 17 of them, !equivS.equiv(previous.product(p), stamps.product(p))
returns true.
It is also very weird that despite I am also using https://github.com/OndrejSpanel/IncRepro, I am somehow getting different logs than https://github.com/sbt/zinc/issues/1396#issuecomment-2359615278. For me the log is
[debug] compilation cycle 1
[info] compiling 15 Scala sources to /Users/jiahuitan/IncRepro/target/scala-3.5.0/classes ...
Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.
ok. Reminder to everyone that
If working on Zinc is fun, please by all means keep doing what you're doing, but please do not feel obligated on anything.
[info] compiling 15 Scala sources
I see this when I modify the project using IntelliJ IDE with Compiler based error highlighting. In this mode the IDE interacts with the build system and skews results very much. I have learned hard way that to learn anything useful about the SBT build process, the IDE should not be open, or at least be in Built-in highlighting mode.
When I use notepad to edit the files, I consistently get the results I have shown:
PS C:\Dev\inc-repro> sbt compile [info] welcome to sbt 1.10.1 (Eclipse Adoptium Java 21.0.3) [info] loading global plugins from C:\Users\Ondra.sbt\1.0\plugins [info] loading project definition from C:\Dev\inc-repro\project [info] loading settings for project inc-repro from build.sbt ... [info] set current project to IncRepro (in build file:/C:/Dev/inc-repro/) [info] Executing in batch mode. For better performance use sbt's shell [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [success] Total time: 4 s, completed 29. 9. 2024 0:56:23
I running this on Win11 x64, Java 21.
Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.
ok. Reminder to everyone that
- Zinc is an open source software with no warrantees. This means that it's a free sandwich, and any additional ham and cheese should ideally be added by their own user, or purchased via some consultancy who can.
- Friendseeker is a student who is studying Computer Science.
If working on Zinc is fun, please by all means keep doing what you're doing, but please do not feel obligated on anything.
Thanks Eugene for reminding me the facts. While Ondrej is being extremely helpful in terms of investigating the issue and being very nice in general, indeed it does not mean I have an obligation to resolve this ticket by doing a weekend crunch.
I will keep investigate this ticket, but I shall prioritize playing with my cat during the long weekend instead of directing all my energy towards this issue.
[info] compiling 15 Scala sources
I see this when I modify the project using IntelliJ IDE with Compiler based error highlighting. In this mode the IDE interacts with the build system and skews results very much. I have learned hard way that to learn anything useful about the SBT build process, the IDE should not be open, or at least be in Built-in highlighting mode.
When I use notepad to edit the files, I consistently get the results I have shown:
PS C:\Dev\inc-repro> sbt compile [info] welcome to sbt 1.10.1 (Eclipse Adoptium Java 21.0.3) [info] loading global plugins from C:\Users\Ondra.sbt\1.0\plugins [info] loading project definition from C:\Dev\inc-repro\project [info] loading settings for project inc-repro from build.sbt ... [info] set current project to IncRepro (in build file:/C:/Dev/inc-repro/) [info] Executing in batch mode. For better performance use sbt's shell [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [success] Total time: 4 s, completed 29. 9. 2024 0:56:23
I running this on Win11 x64, Java 21.
Thanks. Oh my there are a lot of holes. One is that Compiler based error highlighting ruins class file reproducibility. I used an online file hash generator and found out the class file hashes indeed are different across different runs...
I shall try using Built-in highlighting mode
and see if I can reproduce the same logs.
- any additional ham and cheese should ideally be added by their own user, or purchased via some consultancy who can.
For information: I have create a post about this on scala-users.
I had figured out what caused net.gamatron.json.Schema
to be invalidated. Here's the invalidation chain.
net.opengrabeso.json.Schema -> net.opengrabeso.json.ASchema
net.opengrabeso.json.ASchema -> net.opengrabeso.engine.tx
net.opengrabeso.engine.tx -> net.opengrabeso.engine.MTTx,
net.opengrabeso.engine.MT -> net.opengrabeso.engine.MTOpaque,
net.opengrabeso.engine.MTOpaque -> net.opengrabeso.p.OM,
net.opengrabeso.p.OM -> net.opengrabeso.p.OMShaders,
net.opengrabeso.p.OMShaders -> net.opengrabeso.p.state.HM,
net.opengrabeso.p.state.HM -> net.opengrabeso.p.World,
net.opengrabeso.p.World -> net.opengrabeso.p.s.SLog$.HDesc,
net.opengrabeso.p.s.SLog$.HDesc -> net.opengrabeso.p.state.package,
net.opengrabeso.p.state.package -> net.opengrabeso.p.State,
net.opengrabeso.p.State -> net.opengrabeso.p.state.WC,
Basically Zinc has a mechanism called "third order invalidation" introduced by Dale in https://github.com/sbt/zinc/pull/1282. Zinc tracks list of all (transitive) classes for which a macro (in this case net.opengrabeso.json.Schema
) depends on in a crude manner, and if any of these changes, Zinc invalidates the macro net.opengrabeso.json.Schema
as any change to any of these classes may affect the behaviour of the macro.
@OndrejSpanel I don't have access to your entire project, but if possible, breaking any part of the chain should ease the issue.
In long term, we need a better macro invalidation (due to upstream change) mechanism. Bytecode based call graph analysis is one promising candidate. The issue with bytecode based analysis though is that it takes significant effort to implement it. Despite its obvious benefits (much refined third order invalidation & potentially better testQuick
), I don't see it being done anytime soon.
Btw welcomes any other idea that third order invalidation can be improved. The best case scenario is if we can come up with some macro invalidation strategy that is almost as good as bytecode analysis but takes significantly less effort to implement. Although no one had ever came up with such strategy, few people were looking into this issue anyways.
if possible, breaking any part of the chain should ease the issue
And it really does. When I move the macro enumerateSubclasses
into a separate object SchemaMacros
with minimal dependencies, I no longer see the issue.
Thanks for the analysis—I really appreciate it! (I hope you had a great time with your cat.) I'd like to express my gratitude in a more tangible way. I've sent you an invite, so you'll be able to find my contact information if you're interested.
I had figured out what caused
net.gamatron.json.Schema
to be invalidated. Here's the invalidation chain.
Can you describe how can I analyze logs so that I am able to find such invalidation chain on my own? While breaking this chain helps in the repro, there must be some other quite different type of chain in the real project, as even without any macros at all I get following behaviour on trivial code changes:
[IJ]jvmCommon/compile
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...
[info] compiling 35 Scala sources and 9 Java sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...
[info] compiling 70 Scala sources and 9 Java sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...
This is indeed a peculiar issue as changes such as “after I renamed a local variable from add to toAdd” should not have any impact at all
Even without a macro I can see in the real project that even trivial changes (changing comment, renaming local variable) cause compilation of at least two Scala source files. Now I could spend another day creating another repro, but perhaps understanding logs and being able to identify the invalidation chains would be more productive?
What makes understanding logs hard for me is that there is a lot of information dumped by sbt / zinc about files / classes which I assume were checked for invalidation, but not invalidated eventually. I consider creating some tool which would parse the logs and show just the invalidation information from them.
Can you describe how can I analyze logs so that I am able to find such invalidation chain on my own?
When withRelationsDebug(true)
is set, Zinc prints the relations between classes, for which we can extract the chain from. However, in practice, I found it to very hard to work with as the relation graph can be large and becomes difficult to trace.
While breaking this chain helps in the repro, there must be some other quite different type of chain in the real project, as even without any macros at all I get following behaviour on trivial code changes
Eugene suggested package object might be causing mass invalidation. Here's the relevant log
[debug] [inv] Package object invalidations: net.gamatron.procas.game.state.package
Try to remove net.gamatron.procas.game.state
package object if possible (e.g. replacing it with regular object). This might address the issue.
but perhaps understanding logs and being able to identify the invalidation chains would be more productive?
I do feel like relying on user to know library internal & logs that are tightly coupled with library internal indicates some deeper problem. Fundamentally there's this common sentiment that Scala has poor tooling, and the particular problem we are investigating is merely a special case of it.
While I currently have no intention of doing any bounty & receiving any form of monetary compensation, I do feel like some form of incentive structure is the proper way forward. Ideally Scala Center/Lightbend can hire more people to work on Scala tooling full time (but that's easier said than done). Li Haoyi style bounty also seemed to be quite effective.
Part of the issue in the real project is https://github.com/sbt/zinc/issues/867#issuecomment-667700482:
My experience seems to be that name hashing does not prevent the invalidation from spreading though. In the example I described above in the sbt project, it did two cycles even though only Foo.scala actually changed and no other file depended on Foo.scala.
My Java files are annotations only (because of RetentionPolicy.RUNTIME
). Once any of it is invalidated (which always happens at the moment), they seem to invalidate all classes they are used with.
Once I disabled pipelining, the behavior is much more reasonable, I usually get only 2 compilation cycles, not 3 or more. (Even 2 are too many on a trivial change, I am working on a reproducer, but the progress is slow).
I am not sure why any of the Java files was invalidated in the first place, and so far I am unable to examine the invalidation chains, therefore I cannot really tell.
My Java files are annotations only (because of RetentionPolicy.RUNTIME). Once any of it is invalidated (which always happens at the moment), they seem to invalidate all classes they are used with.
I believe this is due to https://github.com/sbt/zinc/issues/630
Once I disabled pipelining, the behavior is much more reasonable, I usually get only 2 compilation cycles, not 3 or more. (Even 2 are too many on a trivial change, I am working on a reproducer, but the progress is slow).
Yeah unfortunately pipelining can have some surprising interaction with incremental compilation.
I am not sure why any of the Java files was invalidated in the first place, and so far I am unable to examine the invalidation chains, therefore I cannot really tell.
The best thing we can do is to come up with a minimal reproduction right now. Even if we can examine the chain, it can be likely that the chain does not reveal very much. Frequently Zinc's invalidation is not quite as orthodox as explicitly following a chain due to name hashing optimization & various special cases to override name hashing.
steps
I am afraid I need help to reproduce the issue in a smaller project. My project is SBT based, Scala 3, several modules, JS + JVM, with about 500 Scala source files.
problem
When I make a small change in my project, like renaming a local variable in a function, a lot of files (>400) is compiled.
expectation
On any change which is internal to a single function and does not affect its interface (parameters or return value) only one file should be compiled.
notes
It is possible something in my sources is contributing to the problem, but given current tools, it is quite hard to learn what exactly is the issue. The complete build log obtained using
Level.Debug
has over 50 000 lines.I attach an excerpt obtained using
ThisBuild / logLevel := Level.Debug
from my sbt build which was triggered after I renamed a local variable fromadd
totoAdd
: