scala / scala-dev

Scala 2 team issues. Not for user-facing bugs or directly actionable user-facing improvements. For build/test/infra and for longer-term planning and idea tracking. Our bug tracker is at https://github.com/scala/bug/issues
Apache License 2.0
130 stars 15 forks source link

Enable classloader caching for macros and plugins in 2.13 ? #548

Open smarter opened 6 years ago

smarter commented 6 years ago

In https://github.com/scala/scala/pull/6412 the caching that was added in 2.12.5 was turned off by default:

This PR switches the default in order to be conservative in a 2.12 minor release

Can we reenable this for 2.13 now? The benefits look pretty good, from https://github.com/scala/scala/pull/6314:

In short, this change seems to yield compile-time performance improvements ranging from 16% to 35% for projects using compiler plugins.

/cc @adriaanm @retronym @jvican

jvican commented 6 years ago

The blocker for this change is cache invalidation. In https://github.com/retronym/scala/pull/27, I propose a generic way of getting rid of this (arguably complex) problem in the compiler. However, I'm skeptical we'll be able to enable this feature by default without the help/intervention of build tools.

The general idea is to have a --hash-classpath compiler argument that the users of the compiler can pass to the compiler. This hash classpath would contain a hash per compilation entry telling the compiler whether an entry has changed or not. Based on this information, the compiler could invalidate the caches for those entries in an opaque way.

This is a powerful and generic way of solving this problem. In all the cases I know, the clients of the compiler already have the information about what things changed or not in the classpath (or in the build DAG), and this information is the source of truth.

With this approach, we can support classloader caching "by default" in all Scala codebases with some tweaks to Zinc (while Bazel can instead pass the hashes themselves to the compiler). Zinc already reasons about changes to classpath because it needs to invalidate changes in external projects, so it can trivially pass this information to Scala's 2.13 compiler bridge.

In the linked ticket, I also propose an akin mechanism to invalidate symbols that changed in the symbol table, but that's partially unrelated to this topic as it has more to do with the invalidation mechanism of resident compilation.

jvican commented 6 years ago

I feel I haven't done a good job at fully selling all the benefits of this approach. To me, the most important one is the removal of duplication.

If we have a big classpath for a build of a 100 modules and we implement the invalidation logic inside the compiler, the compiler will need to read the timestamps of every classpath entry in every compilation of a project. That is a lot of work. By letting the build tool tell the compiler which things changed and which didn't, we solve this problem as the state that keeps track of the invalidation comes from outside the compiler and is global to the build tool knowledge.

If I have such hook, bloop and other build tools can implement an efficient way of reading timestamps or computing checksums of compiler inputs in a lazy way (context: https://github.com/scalacenter/bloop/issues/32).

Here's a better way a build tool could implement this than the status quo:

  1. Have a global oracle in the build tool running on an IO thread pool that watches changes in compiler inputs (like classpath entries).
  2. When the global oracle detects a change, it lazily reads its timestamp or computes it checksum and exposes it to the rest of the build tool that needs this information.

When does the build tools needs this information? It turns out that it does in several cases:

  1. Detect a change in the file system. The file watcher can get spurious OS notifications when no content has been changed. To get reliable notifications, file watchers like https://github.com/gmethvin/directory-watcher/ hash the contents of the watches files (I have no idea why timestamps are not enough here).
  2. Detect changes in sources and classpath in Zinc. Zinc currently computes checksums for both classpath entries and sources.
  3. Detect changes in classpath inside the compiler.

All these use cases happen on a per-project basis, so to reiterate having a global oracle that provides this information to the use sites directly is a big win because most of the classpath entries are shared in the projects of a build.

eatkins commented 6 years ago

@jvican and I talked about this in https://gitter.im/sbt/zinc-contrib. I have an imminent PR to sbt and io that does something quite similar to what is proposed in the above comment.

SethTisue commented 5 years ago

@retronym are you working on this for 2.13.0-RC1?

retronym commented 5 years ago

Not for RC1, unfortunately. I do some more foundational parts merged or queued which make it safer/easier to use, namely: