Open TomasMikula opened 8 years ago
I would probably like to experiment with option 2. above to see how much in practice I would miss Set
operations that expose the intrinsic order.
EDIT: For one, it would be impossible to define a serialization codec for such Set, since serialization necessarily needs some order.
Another option is to move it to alleycats to reunion with some other unlawful set instances.
Good catch!
I have sort of conflicting feelings about this. Moving PartialOrder[Set]
to alleycats is not a bad idea.
I'm reluctant to split equality and equivalence at this stage, although it's certainly a useful distinction. What do others think?
It should be clarified that in ∀ a, b, f: a=b ⇒ f(a)=f(b)
, f
denotes a pure function without any side effects. Many of Set
's method are just not pure. I don't see that as the fault of Eq
.
@eed3si9n The observation here is that we can potentially end up with inconsistent results:
def inconsistent(xs: Set[A], ys: Set[A]): (Boolean, Boolean) =
(xs == ys, xs.toList == ys.toList)
I'm assuming it's possible to construct sets which contain the same elements but which have a different internal order exposed to .toList
, .iterator
, etc. If that's not possible then we have nothing to worry about.
Here's an example:
val x = 1L
val y = (1L << 32) | 1L
val z = (1L << 33) | 1L
val a = Set(x, y, z)
val aa = a.toList // List(1, 4294967297, 8589934593)
val b = Set(z, y, x)
val bb = b.toList // List(8589934593, 4294967297, 1)
(a == b, aa == bb) // (true, false)
My claim is that toList
is non-determistic, and thus not pure, similar to Random is not pure.
But for any particular value of Set[_]
it is deterministic and pure, right? It's just that sets that we think of as the same can have unpredictable differences exposed via that method. I don't think non-deterministic (or impure) is the right way to describe what's going on there.
Some methods (i.e. .hashCode
, eq
, etc) are not considered part of the public API (because they expose JVM internals). I don't think I would argue that a method like .toList
qualifies here.
I'm reluctant to split equality and equivalence at this stage, although it's certainly a useful distinction. What do others think?
Yeah, for equality we can assume there to be a unique typeclass instance. However, there can be multiple well-defined equivalences and we don't have a good way in Scala to ensure the correct one is used in each place. This issue helped me understand the motivation for quotient types.
But for any particular value of Set[_] it is deterministic and pure, right?
I don't think there's any guarantee on what ordering toList
is going to return even for a particular value. Set[_]
is just a trait, so anyone can implement anything. Even with the default implementation, there's the sensitivity to the hashcode, which may or may not match up with Eq
semantics:
scala> def foo: Set[Object] = {
val o1 = new Object { override def toString = "o1" }
val o2 = new Object { override def toString = "o2" }
val o3 = new Object { override def toString = "o3" }
val o4 = new Object { override def toString = "o4" }
val o5 = new Object { override def toString = "o5" }
Set(o1, o2, o3, o4, o5)
}
foo: Set[Object]
scala> foo
res0: Set[Object] = Set(o4, o2, o5, o1, o3)
scala> foo
res1: Set[Object] = Set(o1, o4, o5, o3, o2)
The example was adopted from https://issues.scala-lang.org/browse/SI-8184
I don't think there's any guarantee on what ordering
toList
is going to return even for a particular value.
It is unspecified, but deterministic. In your example, I would consider foo
an impure function, not toList
/toString
.
That said, even though toList
is completely deterministic, I'm not dismissing the idea of treating it as non-deterministic (impure). Then we would have to revoke the Foldable[Set]
instance, though.
Since Foldable
is basically lawless anyway I'd rather leave Foldable[Set]
alone. The way I see it we have two basic options:
(1) Move Eq[Set[A]]
to alleycats (or equivalent) since it is not law-abiding.
(2) Declare that Eq[Set[A]]
is in the same category as Monoid[Double]
and allow it with a caution.
I don't have strong feelings here. I think (2) might be OK. Since the type itself relies on universal equality and hash code, many people already avoid it.
It's also worth noting that instances for Map[K, V]
likely have the same kinds of issues.
Since
Foldable
is basically lawless anyway I'd rather leaveFoldable[Set]
alone.
You're right. I was thinking that Foldable[Set]
exposes the order of elements, and thus could yield two different results for "equal" sets, but that again is a problem with equality, not Foldable
.
I basically agree with you on the options for how to resolve this issue with scala.collection.immutable.Set
.
I'm trying to think how would we address the issue with an idealized implementation of set (if only I had a project to experiment with it 😃 ). The problem with "allow with caution" is that "caution" doesn't propagate through the levels of abstraction (if it's not captured in the types). When I have some time, I will experiment with the equality vs. equivalence distinction.
For awhile I've been imagining a new import (maybe cats.strictImplicits._
) that would leave out things like Monoid[Double]
, and maybe some of the other instances people don't like. One solution to this would be to leave things like Eq[Set]
out of strictImplicits
.
I haven't proposed this yet since it dovetails with a potentially controversial take on the meaning of laws and law-checking that I've been meaning to write and share here.
Here is a write-up of where I got with the equivalence-equality distinction: Equivalence versus Equality. To me it looks quite promising.
@TomasMikula thanks for bringing this up. It's something that I never would have thought about.
Personally I think that iteration order wouldn't be sufficient reason to get rid of Eq
instances for Set
. I think that if people are working in terms of Set
then 1) they probably don't care about ordering, and 2) equivalence is probably what they want if they are checking ===
on two sets.
To me the more compelling reason to get rid of Eq
instances for Set
is that it depends on (non-parametric) universal equality. This means that we don't restrict Eq
instances for Set
to those that are meaningful. For example there is no canonical Eq[String => Int]
, but we'll happily provide an Eq[Set[String => Int]]
even though it will probably never do anything useful. Furthermore, since universal equality is non-parametric it's fairly easy to create situations where the symmetry law doesn't hold.
Having said all of that, I'm still kind of neutral on whether we provide Eq
instances for Set
. It's not well-defined (when it comes to ordering) and its dependence on universal equality and hashcode are pretty terrible from an FP perspective. But it also shows up quite a bit in scala code, and as far as I know there aren't great functional alternatives that have some of the same performance characteristics. At the end of the day, if I type mySetA === mySetB
then these instances are probably doing exactly what I intended them to do and with slightly more type safety (I can't compare Set[String] to
Set[URI]` without a compile error).
TLDR: I think that this is only one of at least 2 problems with Eq
instances for Set
, but I could still be convinced either way of whether cats includes these instances by default.
I think we all agree that Eq[Set[_]]
has to live somewhere, because it is just too useful. We also all tend to not have a very strong opinion about where it should live, so I guess that means no action until there is a strong opinion.
My other question, how would we deal with the issue with and idealized implementation of Set, I explored in the writeup I linked to in my previous comment and am content with the findings for now.
Final issue is that this was not caught by the tests. To that end, I think OrderLaws
should also take an Arbitrary[A => A]
(or perhaps Arbitrary[A => Int]
) as an argument, instead of synthesizing one from Arbitrary[A]
. Then users who test their Eq
instance could provide their own Arbitrary[A => Int]
. This is probably the only actionable item for now.
as far as I know there aren't great functional alternatives that have some of the same performance characteristics
hasheq is a fairly mechanical rewrite of HashSet
and HashMap
from the Scala library to typeclass style. If I haven't messed up somewhere, the only performance penalty should therefore be passing around the typeclass instances. Not sure how much of an impact that has. hasheq
could certainly use some benchmarks.
so, one thing that worries me, is if we have Eq[Set[A]]
and we can get that for all A
, I can use Contravariant to get Eq[A]
(Eq[Set[A]].by { a: A => Set(a) }
).
So, I don't like there being a warning-free backdoor for universal equality in the standard instances.
I can see moving it to an area that does not come with the usual "everything". I would like to see an unsafe
package or something (rather than a whole different repo).
https://github.com/TomasMikula/hasheq looks really interesting. I would love to see something like that become a typelevel project and interop nicely with cats (or even become part of cats).
The
Eq[Set[A]]
instance given here is not lawful. It doesn't satisfy the substitution property given here that saysThe problem is that
Set
s expose some intrinsic order of elements (observable via methods likeiterator
,foldLeft
, ...). For some implementations of sets, e.g.ListSet
andHashSet
, it is possible to construct two instances that contain the same elements (and are thus considered equal by the providedEq
), but iterate over them in different order (so it is possible to define a functionf
that falsifies the substitution property above). As a result, the providedEq
instance is not equality, just equivalence.I'm not sure what should be done about it, if anything. It would be easy to declare
Set
from stdlib broken, but how would we handle the issue with an idealized implementation ofSet
?Some possibilities are:
Equiv
for equivalence (thenEq extends Equiv
) and only provideEquiv[Set]
, notEq[Set]
. (We might then also need another typeclass for order which represents order on the equivalence classes.)Our idealized
Set
would not have any API that exposes the intrinsic order. This means, for example, that we lose theFoldable[Set]
instance, since folds expose order of elements. We could, however, have for examplefoldMap
with an additional constraint that the monoid be commutative:Then the order would remain unobservable. Is there something like
CommutativeFoldable
?Another issue is that this problem is not caught by current tests, since ScalaCheck will never (IMO) generate a function
f: Set[A] => Set[A]
whose return value depends on the intrinsic order of the argument. As currently written,OrderLaws
do not let me pass in my ownArbitrary[A => A]
.