scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.86k stars 1.06k forks source link

Slow compilation with chained dependent match types #14903

Closed prolativ closed 2 years ago

prolativ commented 2 years ago

Compiler version

3.1.2

Minimized code

I encountered the problem while trying to migrate a part of akka-http (forked from parboiled2) to Scala 3.

import annotation.unchecked.uncheckedVariance

sealed trait HList
sealed trait HNil extends HList
case object HNil extends HNil
case class ::[+H, +T <: HList](head: H, tail: T) extends HList

type Concat[X <: HList, Y <: HList] <: HList = X match
  case HNil   => Y
  case h :: t => h :: Concat[t, Y]

/**
  * Decompose L into Prefix ++ Suffix if possible
*/
type StripSuffix[L <: HList, Suffix <: HList] <: Option[HList] = L match
  case Suffix => Some[HNil]
  case h :: t => StripSuffix[t, Suffix] match
    case Some[x] => Some[h :: x]
    case _ => None.type
  case _      => None.type

/**
  * type-level implementation of this logic:
  *   Out =
  *     R                      if T has a tail of type L
  *     (L dropRight T) ++ R   if L has a tail of type T
*/
sealed trait TailSwitch[L <: HList, T <: HList, R <: HList]:
  type Out <: HList

object TailSwitch:
  type TS[L <: HList, T <: HList, R <: HList] <: HList =
    StripSuffix[T, L] match
      case Some[_] => R
      case _ => StripSuffix[L, T] match
        case Some[x] => Concat[x, R]

  implicit def tailSwitch[L <: HList, T <: HList, R <: HList]: (TailSwitch[L, T, R] {
    type Out = TS[L, T, R]
  }) = new TailSwitch[L, T, R] { type Out = TS[L, T, R] }

/**
 * Rule popping I from stack and pushing back O
*/
sealed class Rule[-I <: HList, +O <: HList]:
  def ~[I2 <: HList, O2 <: HList](that: Rule[I2, O2])(implicit
      i: TailSwitch[I2, O @uncheckedVariance, I @uncheckedVariance],
      o: TailSwitch[O @uncheckedVariance, I2, O2]
  ): Rule[i.Out, o.Out] = ???

object Test:
  def dot = new Rule[HNil, HNil] {}
  def num = new Rule[HNil, Byte :: HNil] {}
  def pattern = num ~ dot ~ num ~ dot ~ num ~ dot ~ num

Output

This does compile although the compilation is very slow and making patter even slightly more complex makes the compilation time much longer. It turns out that the compilation time strongly depends on how the tailSwitch instance is defined. If it's just an implicit def or given then the compilation takes about 3.5 minutes. For inline given it's 2 minutes. Luckily there seems to be a workaround for the problem, which is using transparent inline given - then the code compiles in less than a second.

Expectation

The compilation should be as fast as for transparent inline given for all other cases as the types are given very explicitly here and transparent doesn't seem to narrow the type.

jrudolph commented 2 years ago

A simple guess would be that the result of the first step is a Rule[I, O] where both I and O are still the TS type (applying the match type, did prove that it exists, but it wasn't reduced to its final form). Now, we start with Rule[TS[...], TS[...] going into the second step which already has to do double the work, basically redoing the first step + doing the next step. That type result will be something like Rule[TS[TS[...], TS[...], TS[...]], TS[TS[...], TS[...], TS[...]]] explaining the exponential explosion.

The root cause would be how implicits and type matches combine in this case.

Would that make sense?

prolativ commented 2 years ago

I have the same intuition. I would say the problem here is that match types are reduced too lazily.

jrudolph commented 2 years ago

Here's a simpler example, it's not even about the wrapping type class but it seems to depend just on the number of patterns in the match:

  trait Wrapper[T] {
    type Out
  }

  type Func[T] =
    T match {
      case String => Long
      case Long   => Int
      case Int    => Float
      case Float => Double
      case Double => Unit
      case Unit => String
    }

  implicit def infer[A]: Wrapper[One[A]] { type Out = Func[A] } = ???

  trait One[A] {
    def use(implicit w: Wrapper[One[A]]): One[w.Out]
  }

  val x: One[Long] = null
  x.use.use.use.use.use.use.use
odersky commented 2 years ago

@jrudolph I think that makes sense. We do cache match type reduction, but we have to be very conservative in order to be sound, which means that we have to invalidate caches in many situations. It seems that applies here. /cc @OlivierBlanvillain maybe he can shed some more light on the matter.

But it's good that transparent inline solves the problem.

jrudolph commented 2 years ago

Thanks, @odersky.

Speaking of caches, WeakHashSet related code turns up with > 7 % in profiling of this case. In particular, the queue.poll call seems to be expensive. I don't know if this is indicative of bad cache usage or just a property of poll. Is it really worth it to even use a ReferenceQueue in this case? Wouldn't it be easier to clean up GC'd entries while accessing the table?

odersky commented 2 years ago

The WeakHashSet is the set of all cached types. That's a different cache actually. A high profiling count could point to a problem in the implementation of that cache. Maybe @smarter could take a look at it, he converted our type tables to weak hash sets. It's also an indication that we generate a lot of different types. Would be interesting to find out what they are.

odersky commented 2 years ago

Indeed, I notice that reduction caches are not effective in this case. I see no re-use at all, and millions of footprint mismatches.

In fact it's something else. We generate millions of fresh match types, all of the form:

?1.Out match {
  case String => Long
  case Long => Int
  case Int => Float
  case Float => Double
  case Double => Unit
  case Unit => String
}

Since every match type is fresh, it needs to be reduced from scratch, so the cache is populated but never looked up. The problem is that the skolem type ?1 is not cached and that means every type containing it is not cached either.

odersky commented 2 years ago

Luckily there seems to be a workaround for the problem, which is using transparent inline given - then the code compiles in less than a second.

In fact I could not reproduce that. I hangs for me also with transparent inline.

odersky commented 2 years ago

It seems that caching skolem types solves the problem. But I note that the original code has an error. See the test file in the PR.

smarter commented 2 years ago

Is it really worth it to even use a ReferenceQueue in this case?

I don't know, but that design was imported wholesale from Scala 2 where it survived many years of optimization work, and using it in Scala 3 did not lead to slowdowns compared to a regular HashMap, cf https://github.com/lampepfl/dotty/pull/12935

jrudolph commented 2 years ago

But I note that the original code has an error

It compiles fine for me on 3.1.1 and 3.1.2 when using

implicit transparent inline def tailSwitch...
odersky commented 2 years ago

I get a hanged compiler when I run that code from main. And with the changes in the PR I get:

-- [E007] Type Mismatch Error: i14903.scala:54:16 ------------------------------
54 |  def pattern = num ~ dot ~ num ~ dot ~ num ~ dot ~ num // error
   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |Found:    Rule[?1.Out, ?2.Out]
   |Required: Rule[HNil, 
   |  TailSwitch[Byte :: Byte :: Byte :: HNil, HNil, Byte :: HNil]{
   |    Out = (Byte :: Byte :: Byte :: Byte :: HNil)
   |  }#Out
   |]
   |
   |where:    ?1 is an unknown value of type TailSwitch[HNil, ?3.Out, ?4.Out]{Out = TailSwitch.TS[HNil, ?3.Out, ?4.Out]}
   |          ?2 is an unknown value of type TailSwitch[?3.Out, HNil, Byte :: HNil]{
   |  Out = TailSwitch.TS[?3.Out, HNil, Byte :: HNil]
   |}
   |
   |
   |Note: a match type could not be fully reduced:
   |
   |  trying to reduce  StripSuffix[?5.Out, HNil]
   |  failed since selector  ?5.Out
   |  does not match  case h :: t => StripSuffix[t, HNil] match {
   |  case Some[x] => Some[h :: x]
   |  case Any => None.type
   |} <: Option[HList]
   |  and cannot be shown to be disjoint from it either.
   |  Therefore, reduction cannot advance to the remaining case
   |
   |    case _ => None.type
   |
   | longer explanation available when compiling with `-explain`

Correction: It did not hang in main (with transparent inline), it just took a minute or so. And I get the same error as above when it terminates.

prolativ commented 2 years ago

Indeed. This looks like a regression between main and 3.1.2 then.

odersky commented 2 years ago

@prolativ Can we find out what was the commit that regressed? And, is the error legit or wrong?

jrudolph commented 2 years ago

3.1.3-RC1-bin-20220404-ad2553d-NIGHTLY is the first nightly that doesn't compile quickly with transparent inline, so it should be in a26a2c3..ad2553d. Maybe #13780?

odersky commented 2 years ago

I guess #13780 is a likely candidate. The question still remains whether the error is legit or not.

smarter commented 2 years ago

For anyone looking into this, note that the error message doesn't give enough information because it talks about skolems without giving their underlying type, this happened before and in https://github.com/lampepfl/dotty/issues/13491#issuecomment-918127667 I gave a way to workaround this:

Replacing i" by ex" in the errors printed in MatchTypeTrace#explainEntry, we can get more information

(I didn't make a PR for that because it can lead to the same skolem being explained multiple times so something a bit more clever is needed)

smarter commented 2 years ago

In d1957e58392466223f6a17625794ad5249557cff, the test case for #13491 was changed as follow:

--- tests/pos/13491.scala
+++ tests/pos/13491.scala
@@ -86,7 +86,8 @@ object Rule {
   type RuleN[+L <: HList]   = Rule[HNil, L]

   def rule[I <: HList, O <: HList](r: Rule[I, O]): Rule[I, O] = ???
-  implicit def valueMap[T](m: Map[String, T])(implicit h: HListable[T]): RuleN[h.Out] = ???
+
+  implicit def valueMap[T, Out0 <: HList](m: Map[String, T])(implicit h: HListable[T] { type Out = Out0 }): RuleN[Out0] = ???
 }

 object Test {

If I apply the same kind of change to the test case in this issue, it works too (and it doesn't lead to slow compilation on master anymore since skolems aren't involved):

diff --git tests/neg/i14903.scala tests/neg/i14903.scala
index cd61a12e858..cfd9fd6d546 100644
--- tests/neg/i14903.scala
+++ tests/pos/i14903.scala
@@ -43,10 +43,10 @@ object TailSwitch:
  * Rule popping I from stack and pushing back O
 */
 sealed class Rule[-I <: HList, +O <: HList]:
-  def ~[I2 <: HList, O2 <: HList](that: Rule[I2, O2])(implicit
-      i: TailSwitch[I2, O @uncheckedVariance, I @uncheckedVariance],
-      o: TailSwitch[O @uncheckedVariance, I2, O2]
-  ): Rule[i.Out, o.Out] = ???
+  def ~[I2 <: HList, O2 <: HList, Out0 <: HList, Out1 <: HList](that: Rule[I2, O2])(implicit
+      i: TailSwitch[I2, O @uncheckedVariance, I @uncheckedVariance] { type Out = Out0 },
+      o: TailSwitch[O @uncheckedVariance, I2, O2] { type Out = Out1 }
+  ): Rule[Out0, Out1] = ???

 object Test:
   def dot = new Rule[HNil, HNil] {}
smarter commented 2 years ago

@jrudolph does the proposed change to ~ in my previous comment work for parboiled2?

jrudolph commented 2 years ago

@jrudolph does the proposed change to ~ in my previous comment work for parboiled2?

I can definitely try. Would it have to stay that way or just to check that it would be a suitable workaround for the issue for the time being?

smarter commented 2 years ago

Seems like it would have to be that way, unless there's a chance the compiler could accept this code without breaking soundness /cc @OlivierBlanvillain

jrudolph commented 2 years ago

Seems like it would have to be that way, unless there's a chance the compiler could accept this code without breaking soundness /cc @OlivierBlanvillain

Doing it in that one place doesn't seem to be enough. I would imagine that it will break things at other places as well as this kind of pattern is used in many more places. Which kind of usages are fine and which are going to break with 3.1.3?

smarter commented 2 years ago

Which kind of usages are fine and which are going to break with 3.1.3?

Unclear so far, I've opened https://github.com/lampepfl/dotty/pull/14987 as an alternative fix which if backported would avoid breaking anything hopefully.