Open treeowl opened 1 year ago
I don't think that it's ever ok to make move
a no-op.
Ok, that's a little strong. It's not ok to make move
a no-op if the data structure can contain a thunk (so it's ok for Int#
…).
Why is that? Maybe you have a linear x
variable that is captured by this thunk. You can't know: closures are existentially quantified over everything they capture. To make sure that you've consumed x
, you have to force the thunk. As soon as your structure is moved, you can't guarantee that you will.
Now, if you keep the data structure fully strict, with no thunks inside, then you're probably good. I don't have good mathematical tools, to think through this yet, though. Always proceed with care :slightly_smiling_face:
The structure I'm talking about holds only unrestricted values, so your concern about strictness isn't relevant.
Thinking about it again, I believe move
actually is possible for a pairing heap. It reassociates the heap into what amounts to a sorted list, which is the "ground state", so to speak.
The structure I'm talking about holds only unrestricted values, so your concern about strictness isn't relevant.
It is though, because of the linked structure of the pairing heap, independently of the data held in the nodes.
Thinking about it again, I believe move actually is possible for a pairing heap. It reassociates the heap into what amounts to a sorted list, which is the "ground state", so to speak.
I'm not sure what implementation of move
you have in mind, actually.
A pairing heap looks roughly like
data PQ a where
Empty :: PQ a
Node :: !a -> [PQ a] -> PQ a
with the invariants that the queues in the list are nonempty and the element in each node is no greater than the elements in its children. I'm suggesting something like
fromAscList :: [Ur a] %1-> Ur (PQ a)
fromAscList [] = Empty
fromAscList (Ur a : as)
| Ur q <- fromAscList as
, not (null q)
= Ur (Node a [q])
| otherwise
= Ur (Node a [])
move q = fromAscList (toAscList q)
The key thing is that each node has at most one child, so rebuilding when deleting the minimum is cheap. Cheaper than it needs to be, even, but I don't know how to take advantage of that.
This implementation of move is unproblematic. You can implement it client-side too. You could also simply implement move
as a simple recursion through the PQ
concrete definition (the one you get from the Generic
implementation). (though now, I'm not sure what you were speaking of Unsafe.coerce
in the original post).
Is your question philosophical? Like “is it a good idea to have move
even though the performance characteristics are not good, as it feels so easy to use“? If so, it's a similar point that I brought up at #206 .
The unsafe coercion was a mistake. Making move
do the generic thing would be very very wrong from a performance standpoint. The whole point of linear types in this context is to prevent users from misusing these queues. The amortized bound for minView
is $O(\log n)$ but the worst case bound is $O(n)$. The point of linearity is to avoid multiple $O(n)$ views from nearly identical configurations. move
(as I know how to implement it) is $O(n \log n)$, but it produces a queue in a safe configuration from which all operation sequences will be within the amortized bounds. So I suppose there are different philosophical questions for this move
versus the (wrong) generic move
for the structure.
Aha! Very interesting.
So the idea is, in an amortized structure, you do a bunch of cheap operations, but eventually you have to do an expensive operation. You certainly don't want to do the expensive operation many times, so you want an implementation of move
forces the expensive operation to happen, so that the shared data structure is in a state where the next operations are going to be cheap for a while.
This is an alternative to Okasaki's style laziness, where you the expensive operation is shared/memoised. Probably easier to implement too. It also reminds me of this limitation of laziness I tweeted about last year.
Yes, that's the idea. Okasaki-style structures (e.g., lazy-spined binomial queues as in the pqueue
package I co-maintain) are great when you use persistence heavily. But in many non-persistent or lightly persistent settings those can be overkill. The sorting functions in Data.Sequemce
use pairing heaps, for example, because @lowasser (and layer @oisdk) found they performed well in that application.
(well in their defence pairing heap are a very unfair data structure: they are dead simple, and perform well in many applications)
Is there anything left on this issue, that you'd want to achieve? Or should we close?
I think there may be room for some documentation somewhere? I don't know if that belongs here, though, or in the GHC docs. In case you're curious, this is the linear pairing heap implementation I hacked up.
I think (I'm pretty sure in fact) I wrote a comment here, and never finished. It got lost. Sorry. I'm seeing this again now.
What kind of documentation do you imagine? We can make it happen, I'm just not clear on what it could look like.
In some cases, a pure amortized data structure may only be efficient when used in a single threaded fashion. This suggests a linear interface
Would it be appropriate for something like
PQ a
to be an instance ofMoveable
? If so,move
would be a no-op (move = Unsafe.coerce Ur
), and the user would be fully responsible for making sure performance was okay in context. Or would it be better to only have aperflyRiskyMove
in the module to make it clear that there's a performance risk?The
Dupable
instance is much more obviously reasonable: it would fully reassociate the queue into a "safe" configuration that has no credit. Do we want to provide a module giving an example of a structure like this?