scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

TreeSet equality gives unexpected result based on compare only #10741

Open dariodariodario opened 6 years ago

dariodariodario commented 6 years ago

I stumbled upon this and it left me quite surprised:

import org.scalatest.FlatSpec
import scala.collection.immutable.TreeSet

class TreeSetSpec extends FlatSpec{

  case class A(val x : Int, val y : String) extends Ordered[A]{
    override def compare(that: A): Int = this.x.compareTo(that.x)
  }

  "TreeSet with different elements" should "be different" in {

    val a1 = A(1, "Hello")
    val a2 = A(1, "bye")

    val t1 = TreeSet[A](a1)
    val t2 = TreeSet[A](a2)

    assert(t1 != t2)
  }
}

this test won't pass and the two treesets will evaluate as equal. My understanding was that the compare of the ordered trait was only for sorting the Set, while element equality should rely on equals of the type parameter

lrytz commented 6 years ago

Same issue in https://github.com/scala/collection-strawman. It boils down to

scala> TreeSet(a1) contains a2
res1: Boolean = true

where

  def contains[A: Ordering](tree: Tree[A, _], x: A): Boolean = lookup(tree, x) ne null

  def lookup[A, B](tree: Tree[A, B], x: A)(implicit ordering: Ordering[A]): Tree[A, B] = if (tree eq null) null else {
    val cmp = ordering.compare(x, tree.key)
    if (cmp < 0) lookup(tree.left, x)
    else if (cmp > 0) lookup(tree.right, x)
    else tree
  }
dariodariodario commented 6 years ago

One colleague noted that https://www.scala-lang.org/api/2.11.8/#scala.math.Ordered says:

It is important that the equals method for an instance of Ordered[A] be consistent with the compare method. However, due to limitations inherent in the type erasure semantics, there is no reasonable way to provide a default implementation of equality for instances of Ordered[A]. Therefore, if you need to be able to use equality on an instance of Ordered[A] you must provide it yourself either when inheriting or instantiating.

that said, it seems me that it violates the Set contract, where two sets should be different if they don't contain the same elements... so I think Ordered is breaking it?

Jasper-M commented 6 years ago

FWIW, Java agrees with the Scala implementation on this:

scala> import java.util.TreeSet
import java.util.TreeSet

scala> val set = new TreeSet[A]
set: java.util.TreeSet[A] = []

scala> set.add(A(1, "hello"))
res0: Boolean = true

scala> set
res1: java.util.TreeSet[A] = [A(1,hello)]

scala> set.add(A(1, "bye"))
res2: Boolean = false

scala> set
res3: java.util.TreeSet[A] = [A(1,hello)]

scala> val set0 = new TreeSet[A]
set0: java.util.TreeSet[A] = []

scala> set0.add(A(1, "bye"))
res4: Boolean = true

scala> set == set0
res5: Boolean = true

that said, it seems me that it violates the Set contract, where two sets should be different if they don't contain the same elements... so I think Ordered is breaking it?

I rather think the Ordered contract is there in order not to break the Set contract.

The scaladoc could/should be a lot clearer about this, like the javadoc of java.util.TreeSet.

lrytz commented 6 years ago

Interesting :-) The javadoc mentions it:

a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal

But later on method contains they forgot to refine the doc:

returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e))

dariodariodario commented 6 years ago

@Jasper-M the fact that Java has this behaviour doesn't mean it is the best one...

Jasper-M commented 6 years ago

the fact that Java has this behaviour doesn't mean it is the best one...

I would never dare to imply such a thing 😅 However I think it does show that this behavior is not just a bug.

dariodariodario commented 6 years ago

🤔 ...

Jasper-M commented 6 years ago

And it shows that there might be a good reason for it. What should a TreeSet do if you insert 2 elements that compare says are equal, but that equals says are different? Then you have an indeterministic ordering?

dariodariodario commented 6 years ago

IMHO compare should just mandate an ordering between two elements, which does not imply that they must be equal. To give you an example of a similar situation: if in SQL you put "order by" in a query, you will specify a field. Many different rows will have that same field, and the resulting order for them will not be specified and it does not care about the content of the row. I find that a more sensible behaviour and it is quite accepted.

joroKr21 commented 6 years ago

If both sets are not equal, then taking the union should contain both elements. But now we can no longer maintain the invariant that smaller elements are to the left and larger elements to the right. A better abstraction for this use case might be a MultiSet or a TreeMap from the order key to a set of different elements with the same key.

What should be the semantics for two elements that are the same wrt compareTo, but different wrt to equals? To me that is equivalent to a partial order, because we can't decide either way. But TreeSet requires a total order. So just ignoring equals seems reasonable to me.

Jasper-M commented 6 years ago

I think currently the preferred way to do this is having a Seq[A] (pick your favorite Seq implementation), and do seq.sortBy(_.x). The strawman-contrib (an extension library to the core collections I guess?) does contain a SortedMultiSet.

Ichoran commented 6 years ago

When you use a custom ordering on a TreeSet you are supplying an equivalence class at the same time. If you then choose to use a different equivalence class to compare the elements (whether it be the default equals, or some other ordering), you of course will get different answers.

If you think it's confusing that you use two different kinds of equals (one from your ordering and the default one) and they disagree, well, yes. It's not so obvious that you're choosing to do this.

But imagine if this weren't how things happened. Your ability to reason about how sets work would be heavily impaired.

For instance, you would think that this should hold:

if (!set.contains(x)) (set + x).size == set.size + 1
else (set + x).size == set.size

Right? Either you don't have the element, so you add it and the set is one bigger; or you do have the element, so it replaces the old one, and the set stays the same size.

Not so if contains agrees with equals instead of the ordering.

Okay, so let's say contains has to obey the ordering but set equality doesn't.

Now you'd expect:

{ xs == ys } ==
{ ys.forall(xs contains _) && xs.forall(ys contains _) }

That is, two sets are equal if and only if they each contain all the elements of the other one. But if you change set equality so it's always default equality instead of the equivalence class, this relation (which is the usual mathematical definition of what it means for two sets to be equal) is no longer true.

So: if you have a custom ordering, you have to obey it, with its induced equivalence classes and all.

szeiger commented 5 years ago

Scheduling for RC1 as a documentation issue.