scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Stack overflow in scala.util.hashing.MurmurHash3.productHash with deeply nested case classes #12996

Closed winitzki closed 6 months ago

winitzki commented 6 months ago

When a data structure contains deeply nested case classes (at least 5000 levels deep) there is a stack overflow while computing hashCode(). The stack overflow happens in scala.util.hashing.MurmurHash3.productHash.

Because of this, deeply nested case classes cannot be used as keys in a Map or as elements of a Set.

Reproduction steps

Scala version: 2.12, 2.13, 3.3, 3.4

// murmurhash3_bug.scala
object Bug extends App {
  import scala.util.control.TailCalls._
  import scala.collection.mutable

  sealed trait Tree
  final case class Leaf() extends Tree
  final case class Branch(left: Tree, right: Tree) extends Tree

  def deepTree(n: Int): Tree = { // Create an arbitrarily deeply nested tree, avoiding stack overflow.
    def deep: Int => TailRec[Tree] = {
      case 0 =>
        done(Leaf())
      case n =>
        for {
          x <- tailcall(deep(n - 1))
          y <- tailcall(deep(0)) // Whatever.
        } yield Branch(x, y)
    }
    deep(n).result
  }

  val cache = mutable.Map[Tree, Int]()

  val n = 100000
  val delta = n / 50
  (0 to n by delta).foreach { i =>
    val tree = deepTree(i)
    println(s"Size $i, tree computed")
    cache.put(tree, i)
    println(s"Size $i, tree cached")
  }
}

Run this program:

$ scala-cli -S 3.4.1 murmurhash3_bug.scala
Compiling project (Scala 3.4.1, JVM (17))
Compiled project (Scala 3.4.1, JVM (17))
Size 0, tree computed
Size 0, tree cached
Size 2000, tree computed
Size 2000, tree cached
Size 4000, tree computed
Exception in thread "main" java.lang.StackOverflowError
    at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:343)
    at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:158)
    at Bug$Branch._1(murmurhash3_bug.scala:9)
    at Bug$Branch.productElement(murmurhash3_bug.scala:9)
    at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:76)
    at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:343)
    at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:158)
    at Bug$Branch.hashCode(murmurhash3_bug.scala:9)
    at scala.runtime.Statics.anyHash(Statics.java:127)
    at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:76)
    at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:343)
    at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:158)
    at Bug$Branch.hashCode(murmurhash3_bug.scala:9)
    at scala.runtime.Statics.anyHash(Statics.java:127)
    at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:76)
    at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:343)

With other Scala versions, a stack overflow occurs at different nesting levels, but it always does occur.

Problem

A stack overflow occurs when putting a deeply nested data structure into a hashmap or into a set.

Expected behavior is that a data structure can be used as key of Map or as element of Set, regardless of nesting depth.

Expected behavior is that Object#hashCode() does not throw exceptions regardless of the type of the object.

sjrd commented 6 months ago

IMO this is a case where you need to increase the maximum stack size of the JVM to accommodate the depths in your workload.

It is not possible to implement hashCode() for nested case classes without using the stack for the recursion.

winitzki commented 6 months ago

Is there any way to avoid using Object#hashCode() or to implement Map and Set in a different way?

sjrd commented 6 months ago

You can box your keys in your own class that implements hashCode() however you like.