jaspervdj / psqueues

Priority Search Queues in three different flavors for Haskell
https://hackage.haskell.org/package/psqueues
Other
64 stars 24 forks source link

A question about `link` in `insert` #53

Open mitchellwrosen opened 10 months ago

mitchellwrosen commented 10 months ago

Original question left below, but important edit at the top: upon a second reading I see now that a Bin's mask is actually related to the key stored at that node in some way (though exactly how I can't get say, it's too late here), not just related to the Bin's children, as I thought when I formed this question.

That's all to say, what's left below is a bit bogus and rambling šŸ˜…. It would be great to document some of the design of the internals, though!


Hi there,

Apologies for the low-level question; I'm reading through (and cribbing some of) the implementation of IntPSQ, and I've run across some logic I don't understand. I also haven't been able to find a reference paper about the internals of the data structure - maybe it's a home brew? :)

Here's the logic for inserting into singleton (a Tip):

https://github.com/jaspervdj/psqueues/blob/37c12f5cef9962ca92d4198a5a3ed400e8c64e73/src/Data/IntPSQ/Internal.hs#L235-L237

https://github.com/jaspervdj/psqueues/blob/37c12f5cef9962ca92d4198a5a3ed400e8c64e73/src/Data/IntPSQ/Internal.hs#L257-L262

What I don't understand: the purpose of the link on the right-hand sides.

Say we have a priority queue with one element in it, A. We want to insert an element B. Say B has lower priority. Our (conceptual) priority queue ought to look something like this, then, with A ready to be popped off, and B just underneath.

A
|
B

If I'm understanding the implementation properly, insert seems to do some minor computation to ultimately decide whether the actual shape of the data structure after insertion looks like this:

      Bin A
     /     \
Tip B       Nil

Or this:

      Bin A
     /     \
  Nil       Tip B

But I think they are equally good -- no thinking required -- so insert could just arbitrarily pick one (i.e. not call link).

Moreover, the actual mechanism that insert uses to decide which of the two above shapes to use seems a bit confused, or perhaps confusing (me, that is), and could use a comment explaining its logic, if it's actually intentional and serving a useful purpose, I think.

(Urg, my idealized example above is getting messier as I add in more and more features of the actual data structure...) Let's say that element with priority A has key 10 (binary 00001010) and element with priority B has key 12 (binary 00001100).

insert, by calling link to insert a single element into a singleton queue, seems to say:

  1. Were I to store keys 00001010 and 00001100 together as children of a Bin node, what mask ought I use?
  2. Ah, it's mask 00000100, with a 1 at the highest bit position that any two keys in either child differs at!
  3. With that mask in mind, which side would 00001100 be stored on?
  4. Ah, it's the right side, because it has a 1 at that bit position!
  5. Okay, I will make a Bin node like the second shape above, then, using the mask I computed.

The confusing part is why we begin by considering what mask to use to store both keys as children of a Bin node, when the Bin node we are constructing only has one key as a child (because there are only two keys total).

I think this would work equally well, and, perhaps most importantly, is conceptually simpler, as it's not essentially computing a mask that technically works, but is more appropriate for some other hypothetical Bin node that we aren't actually making.

  1. I am creating a Bin node with a single child whose key is 00001100.
  2. The mask we'll use has just the highest bit of that number, 0000100.
  3. It will be the right child because it has the mask bit set.

Or, in code:

    Tip k' p' x'
      | (p, k) < (p', k') -> Bin k  p  x  (f k') Nil t
      | otherwise         -> Bin k' p' x' (f k)  Nil (Tip k p x)
      where
        f = intFromNat . highestBitMask . natFromInt

Thanks for looking!

Edit: oh, I think the solution above doesn't handle storing key 0 properly. It'd have to go on the left of a Bin, I think.

jaspervdj commented 10 months ago

Hey! I'm a bit swamped so won't have time to answer the specific question, so I will try to look again later this week. I talked about the design a while ago and things haven't changed since, but it's rather high-level and I would prefer to have it in written form as well.

mitchellwrosen commented 10 months ago

No worries! I've had another brief look through the code today and I think I can now define the meaning of the mask bit m, which explains why insert performs the link that it does, rather than my suggestion at the end above (beginning, "I think this would work equally well...").

In Bin k p x m l r, the mask m looks like:

000000100000000

It denotes the leftmost bit at which any two keys in k + l + r differ. Well, kind of, it can get out-of-date due to deletions. So, it's like a left-bound: there can be no pair of keys in k + l + r that differ at a left-er than that one.

The reason I had initially thought its definition was rather only the leftmost bit at which any two keys in l + r differ (not including k) is the definition of the "valid mask" check:

https://github.com/jaspervdj/psqueues/blob/37c12f5cef9962ca92d4198a5a3ed400e8c64e73/src/Data/IntPSQ/Internal.hs#L704-L733

As you can see, the check for a valid mask on a Bin node ignores k, but I think it may be a better check if it did not.