sbt / zinc

Scala incremental compiler library, used by sbt and other build tools
Apache License 2.0
333 stars 118 forks source link

Improve performance of Analysis deserialization. #984

Open retronym opened 3 years ago

retronym commented 3 years ago

We have been looking at ways to improve the performance of analysis deserialization. In large builds this can be a major factor in the responsiveness of the build when Zinc determines that little or no actual compilation is needed.

A quick way to profile this is to checkout and compiler scala/scala, the run the following:

sbt> consoleProject
...
scala> def timed[T](f: => T) = { val start = System.nanoTime; try f finally { println( ((System.nanoTime - start) / 1000 / 1000) + " ms") } }
timed: [T](f: => T)T

scala> for (i <- 1 to 1000) { println(timed(sbt.internal.inc.FileAnalysisStore.binary(file("/Users/jz/code/scala/target/compiler/zinc/inc_compile.zip")).get)) }
400 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@7d93d001, scalacOptions: [Ljava.lang.String;@1417548d, javacOptions: [Ljava.lang.String;@797cdb4a), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@2db4ba0), hasModified: false)]
468 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@612b9022, scalacOptions: [Ljava.lang.String;@4fb613b6, javacOptions: [Ljava.lang.String;@500f3781), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@8d1f06b), hasModified: false)]
467 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@3546dbd3, scalacOptions: [Ljava.lang.String;@4a92556e, javacOptions: [Ljava.lang.String;@6308c082), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@64dd2620), hasModified: false)]
468 ms
...

Attach async-profiler:

$ git clone jvm-profiling-tools/async-profiler && cd async-profiler
$ make clean all
./profiler.sh -d 20 --reverse -f out.html --minwidth 0.5 $PID
image

Opportunities

Inefficient creation of Array

javaList.asScala.iterator.map(f).toArray forgoes the chance to allocate the resulting array with the correct size and needs an intermediate buffer. Try something like:

   implicit class EfficientTraverse[T](seq: JList[T]) {
-    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] =
-      seq.asScala.iterator.map(f).toArray
+    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] = {
+      seq.stream().map(x => f(x)).toArray(new Array[R](_))
+    }
   }

Dead code creates wasted Vector

@@ -695,9 +699,11 @@ final class ProtobufReaders(mapper: ReadMapper, currentVersion: Schema.Version)
       val name = usedName.getName.intern()
       val useScopes = util.EnumSet.noneOf(classOf[UseScope])
       val len = usedName.getScopesCount
-      val scopes = for {
+      for {
         i <- 0 to len - 1
-      } yield useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      } {
+        useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      }
       UsedName.make(name, useScopes)

String interning isn't free

Interning likely-duplicated strings as we read them from the protobuf trades off higher CPU to achieve lower footprint. Maybe it is worth thinking of introducing a name table in the start of the protobuf and replacing fields of type string with indices into this table?

Eager Relation Building is costly

All maps that are deserialized are converted to a Relation[A,B], a pair of maps, one with the forward relations another with the inverse. The inverse map is of type is a Map[B, Set[A]].

The usedNames map appears to be the largest. Is the inverse relation actually used? If not, refactor Zinc to avoid building this up!

The other cost here is building immutable Map/Sets, rather than just upcasting mutable one to collection.{Map,Set}. This may well be worth it if clients of MRelationsNameHashing use its operations to construct new versions of it; these would structurally share with the base version. But its worth analysing how these are used.

It looks like the value in the forward relation in UsedName is not actually accessed as a Set[UsedName], a sequential collection would suffice.

    private def filteredDependencies(dependent: Set[String]): Set[String] = {
      dependent.filter {
        case from if isScalaClass(from) =>
          val affectedNames = usedNames.forward(from).filter(modifiedNames.isModified)

Furthermore, the list of used names for a file is only consulted if there is an memberRef relation between the file with a changed API to the candidate file for invalidation. This is an opportunity for laziness.

Lazy parsing?

If the used names data for many compilation units is actually not needed, can we avoid reading it from disk and parsing it in the first place? This would require an on-disk format with random access into the map of used names for a given unit.

Protobuf's documentation notes:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

I could imagine an incremental change to the current format included an entry in inc_compile.zip for each compilation unit.

Moar Hashing?

Another #brainstorm: what if we compressed the used-names section by hashes of the names? i.e. rather than recording that A.scala -> ["println", "Predef", ...], we instead record that is references ["println".hashCode, "Predef".hascode, ...]? Saving ints rather than Strings would reduce the size of the persisted form, avoid UTF-8 decoding and String interning on deserialization.

In case of hash collisions we would get over-compilation (although it would not propagate far.)

retronym commented 3 years ago

This test evaluates how much of the analysis files on-disk size is consumed by the used names, and how much of the loading time is attributable to that data.

package sbt.inc.binary

import sbt.internal.inc.FileAnalysisStore
import xsbti.compile.AnalysisContents

import java.io.File
import java.nio.file.{ Files, Path }
import java.util.concurrent.TimeUnit

object UsedNameTest {
  def main(args: Array[String]): Unit = {
    val inFile = new File("/Users/jz/code/scala/target/library/zinc/inc_compile.zip")
    val store = FileAnalysisStore.binary(
      inFile
    )
    val contents = store.get().get()
    val analysis = contents.getAnalysis.asInstanceOf[sbt.internal.inc.Analysis]
    val outFile = new File("/tmp/inc_compile.zip")
    val outStore = FileAnalysisStore.binary(outFile)
    outStore.set(
      AnalysisContents.create(
        analysis.copy(relations = analysis.relations.withoutUsedNames),
        contents.getMiniSetup
      )
    )
    printSize(inFile.toPath)
    printSize(outFile.toPath)
    def timedLoad(path: Path): Unit = {
      val now = System.nanoTime()
      FileAnalysisStore.binary(path.toFile).get.get().getAnalysis
      val end = System.nanoTime()
      println(path + " loaded in " + TimeUnit.NANOSECONDS.toMillis((end - now)) + "ms")
    }
    (1 to 32).foreach(_ => timedLoad(inFile.toPath))
    (1 to 32).foreach(_ => timedLoad(outFile.toPath))
  }

  private def printSize(path: Path): Unit = {
    println(path + " size = " + +Files.size(path))
  }
}

This shows about 15-20% of the space (a bit less that I guesstimated) and 30-35% of the time.