tweag / linear-base

Standard library for linear types in Haskell.
MIT License
335 stars 37 forks source link

Movable for amortized structures #445

Open treeowl opened 1 year ago

treeowl commented 1 year ago

In some cases, a pure amortized data structure may only be efficient when used in a single threaded fashion. This suggests a linear interface

-- Pairing heaps
data PQ a = ...

insert :: Ord a => a -> PQ a %1-> PQ a
merge :: Ord a => PQ a %1-> PQ a %1-> PQ a
extractMin :: Ord a => PQ a %1-> Maybe (Ur a, PQ a)

Would it be appropriate for something like PQ a to be an instance of Moveable? If so, move would be a no-op (move = Unsafe.coerce Ur), and the user would be fully responsible for making sure performance was okay in context. Or would it be better to only have a perflyRiskyMove in the module to make it clear that there's a performance risk?

The Dupable instance is much more obviously reasonable: it would fully reassociate the queue into a "safe" configuration that has no credit. Do we want to provide a module giving an example of a structure like this?

aspiwack commented 1 year ago

I don't think that it's ever ok to make move a no-op.

Ok, that's a little strong. It's not ok to make move a no-op if the data structure can contain a thunk (so it's ok for Int#…).

Why is that? Maybe you have a linear x variable that is captured by this thunk. You can't know: closures are existentially quantified over everything they capture. To make sure that you've consumed x, you have to force the thunk. As soon as your structure is moved, you can't guarantee that you will.

Now, if you keep the data structure fully strict, with no thunks inside, then you're probably good. I don't have good mathematical tools, to think through this yet, though. Always proceed with care :slightly_smiling_face:

treeowl commented 1 year ago

The structure I'm talking about holds only unrestricted values, so your concern about strictness isn't relevant.

Thinking about it again, I believe move actually is possible for a pairing heap. It reassociates the heap into what amounts to a sorted list, which is the "ground state", so to speak.

aspiwack commented 1 year ago

The structure I'm talking about holds only unrestricted values, so your concern about strictness isn't relevant.

It is though, because of the linked structure of the pairing heap, independently of the data held in the nodes.

Thinking about it again, I believe move actually is possible for a pairing heap. It reassociates the heap into what amounts to a sorted list, which is the "ground state", so to speak.

I'm not sure what implementation of move you have in mind, actually.

treeowl commented 1 year ago

A pairing heap looks roughly like

data PQ a where
  Empty :: PQ a
  Node :: !a -> [PQ a] -> PQ a

with the invariants that the queues in the list are nonempty and the element in each node is no greater than the elements in its children. I'm suggesting something like

fromAscList :: [Ur a] %1-> Ur (PQ a)
fromAscList [] = Empty
fromAscList (Ur a : as)
  | Ur q <- fromAscList as
  , not (null q)
  = Ur (Node a [q])
  | otherwise
  = Ur (Node a [])

move q = fromAscList (toAscList q)

The key thing is that each node has at most one child, so rebuilding when deleting the minimum is cheap. Cheaper than it needs to be, even, but I don't know how to take advantage of that.

aspiwack commented 1 year ago

This implementation of move is unproblematic. You can implement it client-side too. You could also simply implement move as a simple recursion through the PQ concrete definition (the one you get from the Generic implementation). (though now, I'm not sure what you were speaking of Unsafe.coerce in the original post).

Is your question philosophical? Like “is it a good idea to have move even though the performance characteristics are not good, as it feels so easy to use“? If so, it's a similar point that I brought up at #206 .

treeowl commented 1 year ago

The unsafe coercion was a mistake. Making move do the generic thing would be very very wrong from a performance standpoint. The whole point of linear types in this context is to prevent users from misusing these queues. The amortized bound for minView is $O(\log n)$ but the worst case bound is $O(n)$. The point of linearity is to avoid multiple $O(n)$ views from nearly identical configurations. move (as I know how to implement it) is $O(n \log n)$, but it produces a queue in a safe configuration from which all operation sequences will be within the amortized bounds. So I suppose there are different philosophical questions for this move versus the (wrong) generic move for the structure.

aspiwack commented 1 year ago

Aha! Very interesting.

So the idea is, in an amortized structure, you do a bunch of cheap operations, but eventually you have to do an expensive operation. You certainly don't want to do the expensive operation many times, so you want an implementation of move forces the expensive operation to happen, so that the shared data structure is in a state where the next operations are going to be cheap for a while.

This is an alternative to Okasaki's style laziness, where you the expensive operation is shared/memoised. Probably easier to implement too. It also reminds me of this limitation of laziness I tweeted about last year.

treeowl commented 1 year ago

Yes, that's the idea. Okasaki-style structures (e.g., lazy-spined binomial queues as in the pqueue package I co-maintain) are great when you use persistence heavily. But in many non-persistent or lightly persistent settings those can be overkill. The sorting functions in Data.Sequemce use pairing heaps, for example, because @lowasser (and layer @oisdk) found they performed well in that application.

aspiwack commented 1 year ago

(well in their defence pairing heap are a very unfair data structure: they are dead simple, and perform well in many applications)

Is there anything left on this issue, that you'd want to achieve? Or should we close?

treeowl commented 1 year ago

I think there may be room for some documentation somewhere? I don't know if that belongs here, though, or in the GHC docs. In case you're curious, this is the linear pairing heap implementation I hacked up.

aspiwack commented 1 year ago

I think (I'm pretty sure in fact) I wrote a comment here, and never finished. It got lost. Sorry. I'm seeing this again now.

What kind of documentation do you imagine? We can make it happen, I'm just not clear on what it could look like.