lspitzner / pqueue

Haskell priority queue package
Other
15 stars 13 forks source link

Two equal priority queues with keys can compare as unequal #78

Open konsumlamm opened 2 years ago

konsumlamm commented 2 years ago

Two equal queues may compare as unequal, since duplicate keys are allowed and the Eq implementation takes only care of comparing the keys in order.

Example:

ghci> import qualified Data.PQueue.Min as PMin
ghci> PMin.fromAscList [(1, 'a'), (1, 'b')] == PMin.fromAscList [(1, 'b'), (1, 'a')]
False

This should be documented, as I don't think this can be fixed without making the implementation significantly slower.

treeowl commented 2 years ago

Hrmmmm..... Which operations don't respect ==?

konsumlamm commented 2 years ago

Hrmmmm..... Which operations don't respect ==?

I'm not sure what you mean, the issue is that entries with equal keys (but different values) may be in any order.

treeowl commented 2 years ago

The usual expectation is that if

f :: A -> B
x, y :: A
x == y = True

then

f x == f y = True

We say that f respects ==.

Typically, "safe" API functions should respect ==. If there's a relatively small and well defined set of operations that don't, that can be acceptable for the sake of practicality (see HashMap, for example, where two maps may compare equal even though converting them both to lists may produce differently ordered lists and they may be traversed in different orders). But if it turns out that the most essential priority queue operations don't respect ==, then that == will break everyone's intuition. In that case, it should preferably be changed, or if there's no way to do anything sensible then the Eq instance should likely be removed.

We should start by looking at how insertion and view interact with ==, since those are the most important operations.

treeowl commented 2 years ago

Similarly for binary operations. Given

a == a' = True
b == b' = True

users will want to be able to rely on

merge a b == merge a' b' = True

My bet is that the only ways to make sensible and abstract Eq instances for the Prio queues are to consider the results of minView (hence toList) and/or merge non-deterministic. Considering only minView non-deterministic is easy:

a == b = sort (toList a) == sort (toList b)

I believe we currently consider only merge non-deterministic in our Eq instance, which leads to very difficult/brittle code. So I think we should to the simple thing instead, if we want an Eq instance.

konsumlamm commented 2 years ago

So I did a bunch of (10000) QuickCheck tests and it seems that insert respects ==. I think before changing the Eq instance to (==) `on` (sort . toList), we should be sure that the current implementation is a problem.

treeowl commented 2 years ago

Try merges?

konsumlamm commented 2 years ago

With a better testing method, I actually found an example where insert doesn't respect ==:

ghci> import qualified Data.PQueue.Prio.Min as P
ghci> a = P.fromList [(0, 0), (0, 1), (-1, 0), (-1, 0)]
ghci> b = P.fromList [(-1, 0), (0, 0), (0, 1), (-1, 0)]
ghci> a == b
True
ghci> P.insert 0 0 a == P.insert 0 0 b
False
treeowl commented 2 years ago

Ouch. That's really nasty. I propose either:

  1. Define a == b = (==) `on` sort . toList (or equivalent), and document that minView should be considered non-deterministic relative to ==.

  2. Remove the Eq instance for Prio queues.

Can you think of any situation where it's essential to have an Eq instance for these things?

konsumlamm commented 2 years ago

Can you think of any situation where it's essential to have an Eq instance for these things?

Hmm, not really, though I feel a bit uneasy about removing an instance, as that would be a breaking change (which would be fine with a major version bump ig). I took a look at how other languages (C++, Java, Rust, Scala) handle this problem and it seems none of those implement custom equality for priority queues.

treeowl commented 1 year ago

Those languages are presumably using pointer equality. I believe the correct thing is the sorting ==, and documentation that these queues are "nondeterministic". That is, all the extraction and traversal functions may access values associated with a given key in any order.

konsumlamm commented 1 year ago

At least Rust and C++ don't, Scala 3 doesn't either afaict, Java always has a default implementation, so that's kinda unfair.

treeowl commented 1 year ago

So you prefer to remove the instance rather than implement the sorting one?

konsumlamm commented 1 year ago

I definitely prefer removing the instance over requiring Ord on the values, but I'm not sure that's my favourite option.

The alternative would be to fix the instance. We could compare the sizes and then iterate through the first argument and for each key-value pair look up if it also exists in the second argument. That would be slower, but it should work.

treeowl commented 1 year ago

That would give a pretty awful user experience, I think. We'd have p == q potentially much slower than compare p q == EQ.

konsumlamm commented 1 year ago

The Ord instance should be compatible anyway, so if we change the behaviour of the Eq instance, we also need to change the Ord instance.

treeowl commented 1 year ago

Right, but you won't object to Ord sorting, which will make equal-via-compare faster than ==

konsumlamm commented 1 year ago

Right. I just benched a (naive) implementation of my proposal and it's a lot slower. So I'm fine with removing the instance, though perhaps let's first add a deprecation comment (unfortunately, instances cannot be deprecated).

treeowl commented 1 year ago

Why don't we just do the sorting thing? It has clear semantics, which are almost obvious from the Ord constraint.

konsumlamm commented 1 year ago

I explained why I don't like that option in #106.

treeowl commented 1 year ago

So you don't want a strong constraint because it won't work for some things, and you'd rather remove the instance and have it work for nothing. I want the strong constraint. Whom should we ask to settle this disagreement?

treeowl commented 1 year ago

If we use the Ord a constraint, as I desire, we can certainly also include

slowSameMinPQueue :: (Ord k, Eq a) => MinPQueue k a -> MinPQueue k a -> Bool

And we can even provide a newtype to use it:

newtype SlowEqMinPQueue k a = SlowEqMinPQueue { unSlowEqMinPQueue :: MinPQueue k a }
instance (Ord k, Eq a) => Eq (SlowEqMinPQueue k a) where
  (==) = slowSameMinPQueue

This can be used as a DerivingVia target in some cases.

newtype Floop k a = Floop (Either Int (MinPQueue k a))
  deriving Eq via (Either Int (SlowEqMinPQueue k a))
jappeace commented 1 year ago
PMin.fromAscList [(1, 'a'), (1, 'b')] == PMin.fromAscList [(1, 'b'), (1, 'a')]

in my opinion these aren't equal queues. I believe a priority queue encodes a queue with a priority, if the priorities are all the same I'd expect it to behave like a queue, (also known as a list).

if you'd enforce these as equal you'd end up with something akin to a poset.

konsumlamm commented 1 year ago

I wanted to disagree at first, but now I think you're actually making a good point. If two priority queues produce (when calling toList or repeatedly calling minView) the same elements, but in a different order, they're not really equal. Unfortunately, if two priority queues are equal (by today's implementation), inserting an element may change that (see https://github.com/lspitzner/pqueue/issues/78#issuecomment-1101731260). The "problem" is that inserting has no guarantees where the element is inserted with respect to equal keys. However, I think that's not a big problem, as long as we document it, e.g. HashMap has that problem too.

treeowl commented 1 year ago

@jappeace that's not the traditional definition. It's possible to implement such a structure (see the stable-heap package, or implement a priority queue using something like Hinze-Paterson monoidal finger trees), but that isn't how binomial queues or most other priority queue implementations work. The priority queue abstraction per se says nothing about which entry you get when, and we don't guarantee one in particular.

treeowl commented 1 year ago

@konsumlamm that's why I think it's best to view extraction and traversal operations as nondeterministic. They can take equals to non-equals.

treeowl commented 1 year ago

The only way to test "equal for all purposes, deterministically" is plain structural equality. But that's flipping useless.

konsumlamm commented 1 year ago

I started a discussion about this on Discourse: https://discourse.haskell.org/t/equality-for-priority-queues/6355.

treeowl commented 1 year ago

That's been good so far. It hasn't changed my extreme negative opinion of the quadratic instance, but I have warmed somewhat to the idea of removing the instance altogether (and adding separate functions to test equality and compare).