Open konsumlamm opened 2 years ago
Hrmmmm..... Which operations don't respect ==
?
Hrmmmm..... Which operations don't respect
==
?
I'm not sure what you mean, the issue is that entries with equal keys (but different values) may be in any order.
The usual expectation is that if
f :: A -> B
x, y :: A
x == y = True
then
f x == f y = True
We say that f
respects ==
.
Typically, "safe" API functions should respect ==
. If there's a relatively small and well defined set of operations that don't, that can be acceptable for the sake of practicality (see HashMap
, for example, where two maps may compare equal even though converting them both to lists may produce differently ordered lists and they may be traversed in different orders). But if it turns out that the most essential priority queue operations don't respect ==
, then that ==
will break everyone's intuition. In that case, it should preferably be changed, or if there's no way to do anything sensible then the Eq
instance should likely be removed.
We should start by looking at how insertion and view interact with ==
, since those are the most important operations.
Similarly for binary operations. Given
a == a' = True
b == b' = True
users will want to be able to rely on
merge a b == merge a' b' = True
My bet is that the only ways to make sensible and abstract Eq
instances for the Prio
queues are to consider the results of minView
(hence toList
) and/or merge
non-deterministic. Considering only minView
non-deterministic is easy:
a == b = sort (toList a) == sort (toList b)
I believe we currently consider only merge
non-deterministic in our Eq
instance, which leads to very difficult/brittle code. So I think we should to the simple thing instead, if we want an Eq
instance.
So I did a bunch of (10000) QuickCheck tests and it seems that insert
respects ==
. I think before changing the Eq
instance to (==) `on` (sort . toList)
, we should be sure that the current implementation is a problem.
Try merges?
With a better testing method, I actually found an example where insert
doesn't respect ==
:
ghci> import qualified Data.PQueue.Prio.Min as P
ghci> a = P.fromList [(0, 0), (0, 1), (-1, 0), (-1, 0)]
ghci> b = P.fromList [(-1, 0), (0, 0), (0, 1), (-1, 0)]
ghci> a == b
True
ghci> P.insert 0 0 a == P.insert 0 0 b
False
Ouch. That's really nasty. I propose either:
Define a == b = (==) `on` sort . toList
(or equivalent), and document that minView
should be considered non-deterministic relative to ==
.
Remove the Eq
instance for Prio
queues.
Can you think of any situation where it's essential to have an Eq
instance for these things?
Can you think of any situation where it's essential to have an
Eq
instance for these things?
Hmm, not really, though I feel a bit uneasy about removing an instance, as that would be a breaking change (which would be fine with a major version bump ig). I took a look at how other languages (C++, Java, Rust, Scala) handle this problem and it seems none of those implement custom equality for priority queues.
Those languages are presumably using pointer equality. I believe the correct thing is the sorting ==
, and documentation that these queues are "nondeterministic". That is, all the extraction and traversal functions may access values associated with a given key in any order.
At least Rust and C++ don't, Scala 3 doesn't either afaict, Java always has a default implementation, so that's kinda unfair.
So you prefer to remove the instance rather than implement the sorting one?
I definitely prefer removing the instance over requiring Ord
on the values, but I'm not sure that's my favourite option.
The alternative would be to fix the instance. We could compare the sizes and then iterate through the first argument and for each key-value pair look up if it also exists in the second argument. That would be slower, but it should work.
That would give a pretty awful user experience, I think. We'd have p == q
potentially much slower than compare p q == EQ
.
The Ord
instance should be compatible anyway, so if we change the behaviour of the Eq
instance, we also need to change the Ord
instance.
Right, but you won't object to Ord
sorting, which will make equal-via-compare
faster than ==
Right. I just benched a (naive) implementation of my proposal and it's a lot slower. So I'm fine with removing the instance, though perhaps let's first add a deprecation comment (unfortunately, instances cannot be deprecated).
Why don't we just do the sorting thing? It has clear semantics, which are almost obvious from the Ord
constraint.
I explained why I don't like that option in #106.
So you don't want a strong constraint because it won't work for some things, and you'd rather remove the instance and have it work for nothing. I want the strong constraint. Whom should we ask to settle this disagreement?
If we use the Ord a
constraint, as I desire, we can certainly also include
slowSameMinPQueue :: (Ord k, Eq a) => MinPQueue k a -> MinPQueue k a -> Bool
And we can even provide a newtype to use it:
newtype SlowEqMinPQueue k a = SlowEqMinPQueue { unSlowEqMinPQueue :: MinPQueue k a }
instance (Ord k, Eq a) => Eq (SlowEqMinPQueue k a) where
(==) = slowSameMinPQueue
This can be used as a DerivingVia
target in some cases.
newtype Floop k a = Floop (Either Int (MinPQueue k a))
deriving Eq via (Either Int (SlowEqMinPQueue k a))
PMin.fromAscList [(1, 'a'), (1, 'b')] == PMin.fromAscList [(1, 'b'), (1, 'a')]
in my opinion these aren't equal queues. I believe a priority queue encodes a queue with a priority, if the priorities are all the same I'd expect it to behave like a queue, (also known as a list).
if you'd enforce these as equal you'd end up with something akin to a poset.
I wanted to disagree at first, but now I think you're actually making a good point. If two priority queues produce (when calling toList
or repeatedly calling minView
) the same elements, but in a different order, they're not really equal. Unfortunately, if two priority queues are equal (by today's implementation), inserting an element may change that (see https://github.com/lspitzner/pqueue/issues/78#issuecomment-1101731260). The "problem" is that inserting has no guarantees where the element is inserted with respect to equal keys. However, I think that's not a big problem, as long as we document it, e.g. HashMap
has that problem too.
@jappeace that's not the traditional definition. It's possible to implement such a structure (see the stable-heap
package, or implement a priority queue using something like Hinze-Paterson monoidal finger trees), but that isn't how binomial queues or most other priority queue implementations work. The priority queue abstraction per se says nothing about which entry you get when, and we don't guarantee one in particular.
@konsumlamm that's why I think it's best to view extraction and traversal operations as nondeterministic. They can take equals to non-equals.
The only way to test "equal for all purposes, deterministically" is plain structural equality. But that's flipping useless.
I started a discussion about this on Discourse: https://discourse.haskell.org/t/equality-for-priority-queues/6355.
That's been good so far. It hasn't changed my extreme negative opinion of the quadratic instance, but I have warmed somewhat to the idea of removing the instance altogether (and adding separate functions to test equality and compare).
Two equal queues may compare as unequal, since duplicate keys are allowed and the
Eq
implementation takes only care of comparing the keys in order.Example:
This should be documented, as I don't think this can be fixed without making the implementation significantly slower.