jaspervdj / psqueues

Priority Search Queues in three different flavors for Haskell
https://hackage.haskell.org/package/psqueues
Other
64 stars 24 forks source link

Different representation? #51

Open treeowl opened 1 year ago

treeowl commented 1 year ago

I don't really understand the choice to use the search tree/semi-heap structure. The paper suggests it was chosen for compactness, and yet keys are duplicated all over. Suppose we start from a slightly different place: monoidal search trees:

data PSQ k p v
  = Nil
  | Bin !Int {-# UNPACK #-} !(Elem k p v)
           !p !(PSQ k p v) !(PSQ k p v)

Everything gets borrowed from Data.Map. This doesn't support $(O(1))$ findMin, so it would need to be augmented if that's required, but that's not obviously worse overall. As written above, searching for the minimum priority requires comparisons. One option would be to use different Bin constructors to indicate whether the minimal priority is in the node or in which of its children.

jaspervdj commented 1 year ago

You sort of nerd-sniped me with this and after thinking about this for a while I agree this may actually be a better representation for the OrdPSQ module, though I'm not entirely sure yet.

As your other issues and PRs pointed out, there's definitely room for a bunch of improvements there -- when we wrote this library, all of our attention basically went to supporting fast HashPSQs with millions of items, and the OrdPSQ was based on an existing implementation in GHC's RTS that was only used in case of hash collisions and as a model check for the hash variant (IIRC, this is almost 10 years ago).

treeowl commented 1 year ago

Oh, I'm not sure at all! It just seemed like an obvious unresolved question when I was reading the paper. I'm not at all an expert like the authors, so their way may be better for some reason that's very clear to them.

treeowl commented 1 year ago

I have thought about this a bit more. I think the problem with my suggestion is that it likely gets rather expensive if $O(1)$ access to the min-priority key and value are required. I can think of several ways to do it, none of them very good. Still, it might be worth seeing how those compare, or how a version without that $O(1)$ access might perform in practice.