lspitzner / pqueue

Haskell priority queue package
Other
15 stars 13 forks source link

Restore shape invariant #109

Closed treeowl closed 1 year ago

treeowl commented 1 year ago

Shrink the queue spine in extractBin when its size drops from a power of two.

Fixes #108

treeowl commented 1 year ago

This has extra checking, but seems simple. I have another version in progress that adds a constructor to MExtract that might be worth comparing; I find it rather more confusing, though, so it would only be worthwhile if it's significantly faster.

konsumlamm commented 1 year ago

Would it be possible to add a test for this?

treeowl commented 1 year ago

@konsumlamm I've added a test for plain queues. I'd like you to take a look before I copy that over for Prio queues. The whole testing thing is awkward because we don't export internals and Cabal doesn't allow two .cabal files in the same directory. containers works around that using symlinks, which cause complications testing on Windows. So ... lots of fun all around.

treeowl commented 1 year ago

Another approach would be to add a pqueue-internals package, as suggested long ago in another context by @stephen-smith. That's a peculiarly principled way to expose internals to both the test suite and the world—it's possible for the internals package to go up a major version when the main package only goes up a minor version, or even a patch level.

treeowl commented 1 year ago

If we want, we can actually enforce the shape invariant using a GADT. Here's a definition that merges that change with getting rid of Zero:

data BinomTree rk a = BinomTree !a !(rk a)
newtype One a = One a
data Succ rk a = Succ !(BinomTree rk a) !(rk a)

data BinomForest nil rk a where
  Nil :: BinomForest 'True rk a
  Skip :: BinomForest 'False (Succ rk) a -> BinomForest 'False rk a
  Cons :: !(BinomTree rk a) -> BinomForest nil (Succ rk) a -> BinomForest 'False rk a

data BinomHeap a where
  BinomEmpty :: BinomHeap a
  BinomCons :: !a -> BinomForest nil One a -> BinomHeap a
  BinomSkip :: BinomForest 'False One a -> BinomHeap a

I've made the somewhat unusual choice to force Skip and Cons to produce 'False rather than an arbitrary nullness to avoid needing to allocate copies of structures to change their types. Surprisingly few changes are necessary to support insert and minView. The "topped" queues with size and minimum stored separately will probably want to "unpack" the BinomHeap sum type for compactness, inlining its functions.

I would recommend deferring the switch to a GADT until we get rid of Zero; it's not really worth the overhaul on its own.

treeowl commented 1 year ago

@konsumlamm Feedback on how the test stuff should be structured would be appreciated so I can make progress.

konsumlamm commented 1 year ago

The test structure looks good to me. Perhaps it would be better to move the validity checking functions to the tests though, then we could also use .&&. to get better error messages. It would also be nice to test that the queues are valid for more than just minView (perhaps one could make a combinator that automatically adds a validity check), but that doesn't have to happen now.

If we want, we can actually enforce the shape invariant using a GADT.

That sounds good.

I've made the somewhat unusual choice to force Skip and Cons to produce 'False rather than an arbitrary nullness to avoid needing to allocate copies of structures to change their types.

Isn't nil supposed to indicate wether or not the value is Nil? Why would it make sense to have Skip or Cons be a BinomForest 'True rk a?

I would recommend deferring the switch to a GADT until we get rid of Zero; it's not really worth the overhaul on its own.

Isn't the change just to add an additional type parameter to BinomForest? Or am I missing something?

treeowl commented 1 year ago

The test structure looks good to me. Perhaps it would be better to move the validity checking functions to the tests though, then we could also use .&&. to get better error messages. It would also be nice to test that the queues are valid for more than just minView (perhaps one could make a combinator that automatically adds a validity check), but that doesn't have to happen now.

If we want, we can actually enforce the shape invariant using a GADT.

That sounds good.

I've made the somewhat unusual choice to force Skip and Cons to produce 'False rather than an arbitrary nullness to avoid needing to allocate copies of structures to change their types.

Isn't nil supposed to indicate wether or not the value is Nil? Why would it make sense to have Skip or Cons be a BinomForest 'True rk a?

I would recommend deferring the switch to a GADT until we get rid of Zero; it's not really worth the overhaul on its own.

Isn't the change just to add an additional type parameter to BinomForest? Or am I missing something?

We'd need to deal with an existential at the top, which requires the same sort of fuss as getting rid of Zero but with far less benefit. It would be somewhat less fuss if GHC knew how to unpack existentials, but I've made no headway in my efforts to convince GHC HQ to spend the time on that.

treeowl commented 1 year ago

The usual advice from @pigworker is to make a GADT type tell you what you need, rather than what you have, which would lead to nullable rather than nil. But then we'd have to write a lot of things like

case x of
  Skip f -> Skip f
  Cons t f -> Cons t f

and it's not very nice. Yeah, it should go away in Cmm, but terms are big and I don't trust it and it's gross.

treeowl commented 1 year ago

Then again ... maybe I'm wrong and we really want nullable for merging. I'll play around a bit more.

treeowl commented 1 year ago

Okay, trying to do it with nullable is a big mess, as I predicted. Doing it with nil (and dropping Zero) is actually working out pretty well, despite the slight weirdness. Probably the oddest thing is the need for the type of merge to include a type-level &&, but ... it works. When GHC implements first-class existentials, we won't even need that. There is quite a bit of extra code to deal with the possible rank-zero tree at the front; that's pretty annoying, and I don't yet know how it'll affect performance. Anyway, this all should be tracked in some other ticket.

treeowl commented 1 year ago

I just ran some benchmarks. Fortunately, k-way merge hasn't slowed down measurably, which I was slightly concerned about. Heapsort seems to have improved very slightly in general (not statistically significant). I'm not surprised that we don't see a bigger benefit there; the "tail" of Skips would have been mostly in normal form and mostly contiguous in memory, so walking it would've been very cheap, and in any case $log$ is a slow-growing function, and therefore also slow to shrink. I think we'd need to do something a bit weirder to measure a big impact, like building a huge queue, emptying it almost all the way, then turn on the timing and switching to alternating insertion with deletion. Anyway, it's good to get the bounds back to what they're theoretically supposed to be.