scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Map.from (and technically Set.from) are unsound #12745

Closed noresttherein closed 1 year ago

noresttherein commented 1 year ago

Reproduction steps

I am 99% sure you know about it, but because Set makes an attempt to guard against it, and Map doesn't, I thought I'd mention it. I could not find any mention of it in the database though, and it's been a couple of days since I reported the last bug, so...

Scala version: 2.13.x, 3.x

    val iter :Iterable[(Any, String)] = List(1, 2, 3, 4, 5).map(i => i -> i.toString) to SortedMap.sortedMapFactory
    val set  :Map[Any, String] = Map.from(iter)
    set.contains("oops")

Problem

Obvious:

Exception in thread "main" java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap')
    at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
    at scala.math.Ordering$Int$.compare(Ordering.scala:376)
    at scala.collection.immutable.RedBlackTree$.lookup(RedBlackTree.scala:40)
    at scala.collection.immutable.RedBlackTree$.get(RedBlackTree.scala:33)
    at scala.collection.immutable.TreeMap.get(TreeMap.scala:128)
    at scala.collection.MapOps.contains(Map.scala:281)
    at scala.collection.MapOps.contains$(Map.scala:281)
    at scala.collection.AbstractMap.contains(Map.scala:405)
    at Playground$.delayedEndpoint$Playground$1(Playground.scala:16)
    at Playground$delayedInit$body.apply(Playground.scala:13)
    at scala.Function0.apply$mcV$sp(Function0.scala:42)
    at scala.Function0.apply$mcV$sp$(Function0.scala:42)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
    at scala.App.$anonfun$main$1(App.scala:98)
    at scala.App.$anonfun$main$1$adapted(App.scala:98)
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
    at scala.App.main(App.scala:98)
    at scala.App.main$(App.scala:96)
    at Playground$.main(Playground.scala:13)
    at Playground.main(Playground.scala)

In Set you have:

  def from[E](it: collection.IterableOnce[E]): Set[E] =
    it match {
      // We want `SortedSet` (and subclasses, such as `BitSet`) to
      // rebuild themselves to avoid element type widening issues
      case _: SortedSet[E]         => (newBuilder[E] ++= it).result()
      case _ if it.knownSize == 0  => empty[E]
      case s: Set[E]               => s
      case _                       => (newBuilder[E] ++= it).result()
    }

But Map misses that:

  def from[K, V](it: collection.IterableOnce[(K, V)]): Map[K, V] =
    it match {
      case it: Iterable[_] if it.isEmpty => empty[K, V]
      case m: Map[K, V] => m
      case _ => (newBuilder[K, V] ++= it).result()
    }

Furthermore, even Set is unsound, because it guards itself only against implementations from the standard library; designing it as invariant and then treating it as covariant in the central method to the collection architecture seems a somewhat risky decision. For example, I have (or rather, had, because it is not portable to Scala 3) a NatSet - naturally sorted Set. Thanks to a small trick, it is able to use the same API as Set, and I did not wish to make it a SortedSet because in the original use case I relied on the fact that ordering was natural (i.e., increasing). It will fail the same way Map does in the above example.

I am aware that performance is important, but the same effect can be achieved with positive filtering, rather than negative, i.e. returning only HashSet and HashMap with widened types. Furthermore, it would be possible - though quite ugly - to wrap the argument to from in a proxy which catches ClassCastException and replaces the underlying set with a HashSet/HashMap if needed. Technically, underlying field doesn't even have to be @volatile (assuming all fields in all objects inside a HashSet are final in the byte code): we can leave that synchronization for after an exception is actually caught.

As a side note, warranting perhaps a separate bug report,

    val iter :Iterable[(Any, String)] = List(1, 2, 3, 4, 5).map(i => i -> i.toString) to SortedMap

(without an expclicit call to sortedMapFactory) does not compile for me.

SethTisue commented 1 year ago

@scala/collections

som-snytt commented 1 year ago

I started taking a look, as I know nothing about it.

som-snytt commented 1 year ago

@noresttherein they don't rate-limit bug reporting.

I am 99% sure you know about it

I can't speak for the royal you, but

I know nothing about it.

(Just noticed I typoed that on my first attempt.)

som-snytt commented 1 year ago

The lie is that Map[K, V] is an Iterable[(K, V)]; that "launders" (in the sense of money laundering or whitewashing) the variance.

Even though Map is invariant in K, maybe it is Iterable[(K2, V)]. (I have not experimented but maybe the collections collective have thought about it.)

Map.from ought to warn.

"Practically", I'll try the exclusion pointed out in OP. Maybe there are additional strategies for guarding against a subverted key type; for example, the ordering can be taken to reify the key type.