typelevel / cats

Lightweight, modular, and extensible library for functional programming.
https://typelevel.org/cats/
Other
5.25k stars 1.21k forks source link

Eq[Set[A]] instance is not lawful #1359

Open TomasMikula opened 8 years ago

TomasMikula commented 8 years ago

The Eq[Set[A]] instance given here is not lawful. It doesn't satisfy the substitution property given here that says

The problem is that Sets expose some intrinsic order of elements (observable via methods like iterator, foldLeft, ...). For some implementations of sets, e.g. ListSet and HashSet, it is possible to construct two instances that contain the same elements (and are thus considered equal by the provided Eq), but iterate over them in different order (so it is possible to define a function f that falsifies the substitution property above). As a result, the provided Eq instance is not equality, just equivalence.

I'm not sure what should be done about it, if anything. It would be easy to declare Set from stdlib broken, but how would we handle the issue with an idealized implementation of Set?

Some possibilities are:

  1. Have a type class Equiv for equivalence (then Eq extends Equiv) and only provide Equiv[Set], not Eq[Set]. (We might then also need another typeclass for order which represents order on the equivalence classes.)
  2. Our idealized Set would not have any API that exposes the intrinsic order. This means, for example, that we lose the Foldable[Set] instance, since folds expose order of elements. We could, however, have for example foldMap with an additional constraint that the monoid be commutative:

    def foldMap[A, B](fa: Set[A])(f: A => B)(implicit B: CommutativeMonoid[B]): B

    Then the order would remain unobservable. Is there something like CommutativeFoldable?

Another issue is that this problem is not caught by current tests, since ScalaCheck will never (IMO) generate a function f: Set[A] => Set[A] whose return value depends on the intrinsic order of the argument. As currently written, OrderLaws do not let me pass in my own Arbitrary[A => A].

TomasMikula commented 8 years ago

I would probably like to experiment with option 2. above to see how much in practice I would miss Set operations that expose the intrinsic order.

EDIT: For one, it would be impossible to define a serialization codec for such Set, since serialization necessarily needs some order.

kailuowang commented 8 years ago

Another option is to move it to alleycats to reunion with some other unlawful set instances.

non commented 8 years ago

Good catch!

I have sort of conflicting feelings about this. Moving PartialOrder[Set] to alleycats is not a bad idea.

I'm reluctant to split equality and equivalence at this stage, although it's certainly a useful distinction. What do others think?

eed3si9n commented 8 years ago

It should be clarified that in ∀ a, b, f: a=b ⇒ f(a)=f(b), f denotes a pure function without any side effects. Many of Set's method are just not pure. I don't see that as the fault of Eq.

non commented 8 years ago

@eed3si9n The observation here is that we can potentially end up with inconsistent results:

def inconsistent(xs: Set[A], ys: Set[A]): (Boolean, Boolean) =
  (xs == ys, xs.toList == ys.toList)

I'm assuming it's possible to construct sets which contain the same elements but which have a different internal order exposed to .toList, .iterator, etc. If that's not possible then we have nothing to worry about.

non commented 8 years ago

Here's an example:

val x = 1L
val y = (1L << 32) | 1L
val z = (1L << 33) | 1L

val a = Set(x, y, z)
val aa = a.toList // List(1, 4294967297, 8589934593)
val b = Set(z, y, x)
val bb = b.toList // List(8589934593, 4294967297, 1)

(a == b, aa == bb) // (true, false)
eed3si9n commented 8 years ago

My claim is that toList is non-determistic, and thus not pure, similar to Random is not pure.

non commented 8 years ago

But for any particular value of Set[_] it is deterministic and pure, right? It's just that sets that we think of as the same can have unpredictable differences exposed via that method. I don't think non-deterministic (or impure) is the right way to describe what's going on there.

Some methods (i.e. .hashCode, eq, etc) are not considered part of the public API (because they expose JVM internals). I don't think I would argue that a method like .toList qualifies here.

TomasMikula commented 8 years ago

I'm reluctant to split equality and equivalence at this stage, although it's certainly a useful distinction. What do others think?

Yeah, for equality we can assume there to be a unique typeclass instance. However, there can be multiple well-defined equivalences and we don't have a good way in Scala to ensure the correct one is used in each place. This issue helped me understand the motivation for quotient types.

eed3si9n commented 8 years ago

But for any particular value of Set[_] it is deterministic and pure, right?

I don't think there's any guarantee on what ordering toList is going to return even for a particular value. Set[_] is just a trait, so anyone can implement anything. Even with the default implementation, there's the sensitivity to the hashcode, which may or may not match up with Eq semantics:

scala> def foo: Set[Object] = {
         val o1 = new Object { override def toString = "o1" }
         val o2 = new Object { override def toString = "o2" }
         val o3 = new Object { override def toString = "o3" }
         val o4 = new Object { override def toString = "o4" }
         val o5 = new Object { override def toString = "o5" }
         Set(o1, o2, o3, o4, o5)
       }
foo: Set[Object]

scala> foo
res0: Set[Object] = Set(o4, o2, o5, o1, o3)

scala> foo
res1: Set[Object] = Set(o1, o4, o5, o3, o2)

The example was adopted from https://issues.scala-lang.org/browse/SI-8184

TomasMikula commented 8 years ago

I don't think there's any guarantee on what ordering toList is going to return even for a particular value.

It is unspecified, but deterministic. In your example, I would consider foo an impure function, not toList/toString.

TomasMikula commented 8 years ago

That said, even though toList is completely deterministic, I'm not dismissing the idea of treating it as non-deterministic (impure). Then we would have to revoke the Foldable[Set] instance, though.

non commented 8 years ago

Since Foldable is basically lawless anyway I'd rather leave Foldable[Set] alone. The way I see it we have two basic options:

(1) Move Eq[Set[A]] to alleycats (or equivalent) since it is not law-abiding. (2) Declare that Eq[Set[A]] is in the same category as Monoid[Double] and allow it with a caution.

I don't have strong feelings here. I think (2) might be OK. Since the type itself relies on universal equality and hash code, many people already avoid it.

It's also worth noting that instances for Map[K, V] likely have the same kinds of issues.

TomasMikula commented 8 years ago

Since Foldable is basically lawless anyway I'd rather leave Foldable[Set] alone.

You're right. I was thinking that Foldable[Set] exposes the order of elements, and thus could yield two different results for "equal" sets, but that again is a problem with equality, not Foldable.

I basically agree with you on the options for how to resolve this issue with scala.collection.immutable.Set.

I'm trying to think how would we address the issue with an idealized implementation of set (if only I had a project to experiment with it 😃 ). The problem with "allow with caution" is that "caution" doesn't propagate through the levels of abstraction (if it's not captured in the types). When I have some time, I will experiment with the equality vs. equivalence distinction.

non commented 8 years ago

For awhile I've been imagining a new import (maybe cats.strictImplicits._) that would leave out things like Monoid[Double], and maybe some of the other instances people don't like. One solution to this would be to leave things like Eq[Set] out of strictImplicits.

I haven't proposed this yet since it dovetails with a potentially controversial take on the meaning of laws and law-checking that I've been meaning to write and share here.

TomasMikula commented 8 years ago

Here is a write-up of where I got with the equivalence-equality distinction: Equivalence versus Equality. To me it looks quite promising.

ceedubs commented 8 years ago

@TomasMikula thanks for bringing this up. It's something that I never would have thought about.

Personally I think that iteration order wouldn't be sufficient reason to get rid of Eq instances for Set. I think that if people are working in terms of Set then 1) they probably don't care about ordering, and 2) equivalence is probably what they want if they are checking === on two sets.

To me the more compelling reason to get rid of Eq instances for Set is that it depends on (non-parametric) universal equality. This means that we don't restrict Eq instances for Set to those that are meaningful. For example there is no canonical Eq[String => Int], but we'll happily provide an Eq[Set[String => Int]] even though it will probably never do anything useful. Furthermore, since universal equality is non-parametric it's fairly easy to create situations where the symmetry law doesn't hold.

Having said all of that, I'm still kind of neutral on whether we provide Eq instances for Set. It's not well-defined (when it comes to ordering) and its dependence on universal equality and hashcode are pretty terrible from an FP perspective. But it also shows up quite a bit in scala code, and as far as I know there aren't great functional alternatives that have some of the same performance characteristics. At the end of the day, if I type mySetA === mySetB then these instances are probably doing exactly what I intended them to do and with slightly more type safety (I can't compare Set[String] toSet[URI]` without a compile error).

TLDR: I think that this is only one of at least 2 problems with Eq instances for Set, but I could still be convinced either way of whether cats includes these instances by default.

TomasMikula commented 8 years ago

I think we all agree that Eq[Set[_]] has to live somewhere, because it is just too useful. We also all tend to not have a very strong opinion about where it should live, so I guess that means no action until there is a strong opinion.

My other question, how would we deal with the issue with and idealized implementation of Set, I explored in the writeup I linked to in my previous comment and am content with the findings for now.

Final issue is that this was not caught by the tests. To that end, I think OrderLaws should also take an Arbitrary[A => A] (or perhaps Arbitrary[A => Int]) as an argument, instead of synthesizing one from Arbitrary[A]. Then users who test their Eq instance could provide their own Arbitrary[A => Int]. This is probably the only actionable item for now.

as far as I know there aren't great functional alternatives that have some of the same performance characteristics

hasheq is a fairly mechanical rewrite of HashSet and HashMap from the Scala library to typeclass style. If I haven't messed up somewhere, the only performance penalty should therefore be passing around the typeclass instances. Not sure how much of an impact that has. hasheq could certainly use some benchmarks.

johnynek commented 8 years ago

so, one thing that worries me, is if we have Eq[Set[A]] and we can get that for all A, I can use Contravariant to get Eq[A] (Eq[Set[A]].by { a: A => Set(a) }).

So, I don't like there being a warning-free backdoor for universal equality in the standard instances.

I can see moving it to an area that does not come with the usual "everything". I would like to see an unsafe package or something (rather than a whole different repo).

https://github.com/TomasMikula/hasheq looks really interesting. I would love to see something like that become a typelevel project and interop nicely with cats (or even become part of cats).