scala / scala-dev

Scala 2 team issues. Not for user-facing bugs or directly actionable user-facing improvements. For build/test/infra and for longer-term planning and idea tracking. Our bug tracker is at https://github.com/scala/bug/issues
Apache License 2.0
130 stars 15 forks source link

Simple looking code triggers expensive LUB that performs expensive existential subtyping. #578

Open retronym opened 5 years ago

retronym commented 5 years ago

One of the compiler performance issues I found in a codebase involved a big session of type inference + LUBbing in code creating a Map[A, Seq[Class[_ <: B]], in which there was enough of a lattice in the actual subtypes of B to make LUB computation expensive.

I distilled this into a benchmark corpus:

https://github.com/scala/compiler-benchmark/pull/83

That takes about 2s to compile. The original example was 10x that.

Here’s how the profiler sees things:

https://scala-ci.typesafe.com/view/scala-bench/job/compiler-benchmark/2101/artifact/target/profile-jfr/flame-graph-cpu.svg https://scala-ci.typesafe.com/view/scala-bench/job/compiler-benchmark/2101/artifact/target/profile-jfr/flame-graph-cpu-reverse.svg

I didn’t go as far as characterizing the big-O complexity wrt different N in the example. My working assumption is that primary N is the list of the Seq.apply argument list, and I imagine its O(N^2).

The constant factor multiplied by that complexity is existential subtyping, which creates temporary types by substituting type vars in place of existential quantifiers. These create some pressure on the uniques cache of types.

I once tried the Dotty approach of excluding types we know to be temporary from the uniques cache: https://github.com/scala/scala/compare/2.13.x...retronym:faster/strong-selective-unique

We still have to allocate to create the type, but by avoiding keeping the weak reference from the uniques map to the short-lived type, we make it eligible for GC almost immediately. I never saw enough benefit from this in big benchmarks to make me follow through with it, but maybe with a focussed benchmark like this one we could measure the improvement and justify the change.

retronym commented 5 years ago

@mkeskells noticed that the performance is "less bad" in 2.13.x.

I expanded the example a little to: https://gist.github.com/6d4cfa02b0b170e4a6c3a73db75167cc

And then git bisected between 2.12.7 and 2.13.x. At each step, I ran:

$ (export scalaref=$(git --git-dir /code/scala/.git log -1 --format=%H); scala-ref-version $scalaref; scalac-ref $scalaref -version; time scalac-ref $scalaref /Users/jz/code/compiler-benchmark/corpus/map-seq-inference/latest/test.scala -Ystop-after:typer)
2.13.0-pre-17633b2-SNAPSHOT
Scala compiler version 2.13.0-20170321-232145-17633b2 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

real    0m12.179s
user    0m21.182s
sys 0m0.810s

$ (export scalaref=$(git --git-dir /code/scala/.git log -1 --format=%H); scala-ref-version $scalaref; scalac-ref $scalaref -version; time scalac-ref $scalaref /Users/jz/code/compiler-benchmark/corpus/map-seq-inference/latest/test.scala -Ystop-after:typer)
2.13.0-pre-ef63cb9-SNAPSHOT
Scala compiler version 2.13.0-20170314-134807-ef63cb9 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

real    0m26.383s
user    0m37.135s
sys 0m1.754s

Taking "bad" == "fast" and "good" == "slow".

Cold performance changed from 25s to 12s with this library change:

commit d205a63cd1ea812c262df0d8304123265ccc35d1
Author: Stefan Zeiger <szeiger@novocode.com>
Date:   Fri Dec 16 19:42:45 2016 +0100

    Remove parallel collections from scala-library

    They will be moved into a separate module as part of the ongoing
    modularization of the Scala standard library.

    Tests specific to parallel collections will be reinstated in the new
    scala-parallel-collections project as JUnit tests or pure ScalaCheck
    tests (run directly from sbt).

    Some test cases here contain useful tests for both, parallel collections
    and other features of the standard library. They are split up into
    separate JUnit tests, which leads to some new JUnit tests in this
    project, too.

:040000 040000 c1c83cb024c3b18eb4a275e8e7657f860dc3a9b2 b5537cfedb16c352edbc43c534ea96122b703dea M  src
:040000 040000 3489db10d67975d99332c86c58ca70809f72676d d067998b198b1e91e1fea4eba730d04ec7d8cbd4 M  test