scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
230 stars 21 forks source link

PartialFunction's orElse and andThen aren't stack-safe and should be constant-time operations #12385

Open jbracker opened 3 years ago

jbracker commented 3 years ago

reproduction steps

using Scala 2.13.3,

object Main {
   def main(args: Array[String]): Unit = {
      val pfs: IndexedSeq[PartialFunction[Int, Int]] = (1 to 100000).map(_ => { case a: Int => a })

      val pfIter: Iterator[PartialFunction[Int, Int]] = pfs.iterator
      var result: PartialFunction[Int, Int] = PartialFunction.empty
      while(pfIter.hasNext) {
         result = result orElse pfIter.next()
      }

      println(result(42))
   }
}

problem

This causes a StackOverflowError:

Exception in thread "main" java.lang.StackOverflowError
    at scala.PartialFunction.orElse$(PartialFunction.scala:101)
    at scala.runtime.AbstractPartialFunction.orElse(AbstractPartialFunction.scala:27)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:246)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:234)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:246)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:234)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:246)
    at scala.PartialFunction$OrElse.orElse(PartialFunction.scala:234)
    ...

I think it should not cause a StackOverflowError.

PartialFunction.andThen has the same issue, which you can see by running:

object Main2 {
   def main(args: Array[String]): Unit = {
      val pfs: IndexedSeq[PartialFunction[Int, Int]] = (1 to 100000).map(_ => { case a: Int => a })

      val pfIter: Iterator[PartialFunction[Int, Int]] = pfs.iterator
      var result: PartialFunction[Int, Int] = PartialFunction.empty
      while(pfIter.hasNext) {
         result = result andThen pfIter.next()
      }

      println(result(42))
   }
}

I think this could easily be fixed by using an implementation similar to the following:

  final class StackSafeOrElsePartialFunction[A,B](private val pfs: Vector[PartialFunction[A, B]]) extends PartialFunction[A, B] {
      override def apply(a: A): B =
         pfs.find(_.isDefinedAt(a)).fold(PartialFunction.empty[A,B](a))(_.apply(a))
      override def isDefinedAt(a: A): Boolean =
         pfs.exists(_.isDefinedAt(a))
      override def orElse[A1 <: A, B1 >: B](that: PartialFunction[A1, B1]): PartialFunction[A1, B1] = {
         that match {
            case thatOrElse: StackSafeOrElsePartialFunction[A1, B1] =>
               new StackSafeOrElsePartialFunction(this.pfs ++ thatOrElse.pfs)
            case _ =>
               new StackSafeOrElsePartialFunction(this.pfs :+ that)
         }
      }
   }

Though I am unsure about its performance so that should be considered. Also this does not solve the issue with andThen.

jbracker commented 3 years ago

Also: orElse has a Runtime if O(n) in the number of functions already added previously. Which means the above main Function is in O(n^2). This is very unexpected to me. I would expect orElse to be in O(1).

SethTisue commented 3 years ago

I find the quadratic-time issue especially disturbing, but the stack safety problem is of course concerning as well.

Technically they could be considered separate issues, and a PR containing only a stack safety fix would be accepted, but I think the issues would ideally be addressed together, actually, and that any solution should involve considering both orElse and andThen together. (The stack safety issue arises because it's natural to use recursion to traverse the linked chains, but switching to a proper data structure such as Vector would eliminate the recursion, thus making the stack safety issue disappear.)

The underlying problem here is that we're representing these chains as singly linked lists, which don't permit efficiently adding items at either end. Adopting a Vector based solution, as @jbracker suggests, seems plausible.

@scala/collections this isn't strictly a collections issue, but perhaps some of y'all might be interested in this anyway since it's standard library and because a fix involves considering the underlying data-structure issue

SethTisue commented 3 years ago

Anyone interested in digging into this might like to look at https://github.com/scala/scala/pull/7263 which illuminates the difference between Combined and AndThen. (Without more thinking/digging it isn't clear to me whether Combined needs to be considered in the context of this ticket, too.)

johnynek commented 3 years ago

we pretty massively geeked out on this in cats for andThen/compose case on Function1:

https://github.com/typelevel/cats/blob/main/core/src/main/scala/cats/data/AndThen.scala

That solves the stack safety and is O(1) to call andThen/compose.

Probably something similar can be done for PartialFunction (and that could probably be brought into the standard library rather than fixing it in cats).