TreeSet equals asymmetry when compared against Set or HashSet

scabug commented 10 years ago

When comparing TreeSets with default Set or HashSet when one contains Int and the other Long, the result is asymmetric.

Bills-MacBook-Pro-2:~ bv$ scala
Welcome to Scala version 2.11.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import scala.collection.immutable.TreeSet
import scala.collection.immutable.TreeSet

scala> import scala.collection.immutable.HashSet
import scala.collection.immutable.HashSet

// Comparing Set & HashSet is symmetric
scala> Set(1) == HashSet(1L)
res0: Boolean = true

scala> HashSet(1L) == Set(1)   
res1: Boolean = true

// Comparing Set and TreeSet is asymmetric
scala> Set(1) == TreeSet(1L)
res2: Boolean = false

scala> TreeSet(1L) == Set(1) 
res3: Boolean = true

// Comparing Set and HashSet is still symmetric with Int and Long switched
scala> Set(1L) == HashSet(1)
res4: Boolean = true

scala> HashSet(1) == Set(1L)
res5: Boolean = true

// Comparing Set and TreeSet is still asymmetric with Int and Long switched
scala> Set(1L) == TreeSet(1)
res6: Boolean = false

scala> TreeSet(1) == Set(1L)
res7: Boolean = true

// HashSet and TreeSet have the same asymmetry
scala> HashSet(1L) == TreeSet(1)
res8: Boolean = false

scala> TreeSet(1) == HashSet(1L)
res9: Boolean = true

scabug commented 10 years ago

Imported From: https://issues.scala-lang.org/browse/SI-8851?orig=1 Reporter: @bvenners Affected Versions: 2.10.4, 2.11.2

scabug commented 10 years ago

@retronym said:

scala> (Int.box(1)) == (Long.box(1L))
res5: Boolean = true

scala> Set[Int](1).asInstanceOf[Set[Long]].apply(1L)
res6: Boolean = true

scala> collection.immutable.TreeSet[Long](1L).subsetOf(Set[Int](1).asInstanceOf[Set[Long]])
res7: Boolean = true

scala> collection.immutable.TreeSet[Long](1L).equals(Set[Int](1))
res8: Boolean = true

Here's the nasty cast that leads us down the garden path:

trait GenSetLike {

  /** Compares this set with another object for equality.
   *
   *  '''Note:''' This operation contains an unchecked cast: if `that`
   *        is a set, it will assume with an unchecked cast
   *        that it has the same element type as this set.
   *        Any subsequent ClassCastException is treated as a `false` result.
   *  @param that the other object
   *  @return     `true` if `that` is a set which contains the same elements
   *              as this set.
   */
  override def equals(that: Any): Boolean = that match {
    case that: GenSet[_] =>
      (this eq that) ||
      (that canEqual this) &&
      (this.size == that.size) &&
      (try this subsetOf that.asInstanceOf[GenSet[A]]
       catch { case ex: ClassCastException => false })
    case _ =>
      false
  }

I'm not sure what to do about this one, but I'll assign to @Ichoran who is rarely short of good ideas!

scabug commented 10 years ago

@Ichoran said: I guess I'd better check out what the performance penalty is for switching to doing a contains on every element, and how bad it is to let the exception be thrown but try to recover. I hate to put difficult-to-anticipate 1000x slowdowns.

scabug commented 10 years ago

@paulp said: This seems bad enough, before you start comparing different kinds of sets.

scala> Set(1) == Set(1L)
res0: Boolean = true

scala> TreeSet(1) == TreeSet(1L)
res1: Boolean = false

scabug commented 10 years ago

@Ichoran said: I may well be short on good ideas in this case. We have the same problem with mutable.TreeSet, and mutable and immutable BitSet:

scala> Set(1) == i.TreeSet(1L)
res0: Boolean = false

scala> Set(1L) == i.BitSet(1)
res1: Boolean = false

scala> Set(1L) == m.BitSet(1)
res2: Boolean = false

scala> Set(1L) == m.TreeSet(1)
res3: Boolean = false

scala> m.TreeSet(1) == Set(1L)
res4: Boolean = true

The "solutions" I can think of so far are all of the ugly hacky variety. For instance, in practice BitSet and TreeSet iterate in order. So if you have two non-BT sets, the old method works (as no other sets, AFAIK, contain a math.Ordering or directly assume Int); if one is a BT set, then you call subsetOf in the other direction; and if both are they are ordered so you can compare in O(n) by iterating along both (and might want to special-case further). But although this would work, it is hardly an attractive solution.

SethTisue commented 1 year ago

someone in @scala/collections want to weigh in on whether this ticket should remain open?

lrytz commented 12 months ago

let's merge, but that doesn't mean we're slamming the door shut forever about what else might happen in the future

In Set(1L) == TreeSet(1) the catch { case _: ClassCastException => false } fires because Ordering.Int is called with a java.lang.Long.

sjrd commented 12 months ago

Which is a bigger problem, btw: it doesn't work at all in Scala.js, since ClassCastException is UB. Even on the JVM, it may cause really surprising behavior if one writes an ordering for a particular instantiation of a generic class. In general, if you catch a ClassCastException, you're already throwing your hands up in the air.

lrytz commented 12 months ago

Yes, but it seems that's the tradeoff we ended up with for the moment. Also the devil we know... Maybe things will look different after unfreezing the standard library (SIP-51). Long discussion at https://github.com/scala/bug/issues/12228.

scala / bug

TreeSet equals asymmetry when compared against Set or HashSet #8851