Closed treeowl closed 1 year ago
@konsumlamm This is obviously not quite ready to actually merge, but before I make it so, I was hoping you'd give it a glance and see if it's something you'd be okay with if indeed it improves performance.
Note: along with reducing the total queue size by $n$ words, this reduces the number of words allocated on insertion by $2$ (amortized) and the number of words allocated on deletion by $O(\log n)$, thanks to the slightly more compact Cons
nodes.
@konsumlamm I believe this is ready to go.
I guess there's still the question of whether to let maps be lazy (better for performance in general) or whether to make them stricter (maintaining theoretical worst-case bounds). I'd lean toward the former in this instance, but I hate having to decide.
@konsumlamm, this still awaits your comments.
@konsumlamm It's been two weeks now.
I don't remember the numbers I got from benchmarks. What did they look like to you? The space savings aren't in the Zero
value itself; they're in reclaiming the space we previously used to point to it (all over the place) and using that space to store attached values instead.
I don't remember the numbers I got from benchmarks. What did they look like to you?
The benchmarks take 10-20% less time for me.
The space savings aren't in the
Zero
value itself; they're in reclaiming the space we previously used to point to it (all over the place) and using that space to store attached values instead.
I see, so we save one word per tree. This is $\log n$ in total though, not $n$, as you said, right?
Ok, the folds seem to improve as well.
I see, so we save one word per tree. This is logn in total though, not n, as you said, right?
Incorrect. You can count out the Zero
s in a tree of a given size to get a better understanding. Or you can just notice that we have enough room to fit all $n$ of the values where we used to have nullary Zero
s.
I see, so we save one word per tree. This is logn in total though, not n, as you said, right?
Incorrect. You can count out the
Zero
s in a tree of a given size to get a better understanding. Or you can just notice that we have enough room to fit all n of the values where we used to have nullaryZero
s.
Ah right, the Zero
s are the leaves, not the roots, that's where my mistake was.
I guess there's still the question of whether to let maps be lazy (better for performance in general) or whether to make them stricter (maintaining theoretical worst-case bounds). I'd lean toward the former in this instance, but I hate having to decide.
You mean lazy in the spine? What is the current situation? Does this PR change that?
What were off were my allocation calculations, which I believe I overstated. I'll have to recalculate those if anyone cares.
Lazy all through. If we want to be strict instead, I can do that, but the code will be more complicated and for the most part I expect slower.
So this basically reverts #103? Did you change your mind on #100? I'd be fine with lazy maps, if you say that's more efficient.
Well ... for now it keeps the plain (non-Prio
) ones the same, as the strict constructors are better there. It more than reverts that for Prio
queues, since it extends laziness into the trees (not just the spine). It all falls out of the representation change, which leads to different trade-offs.
I don't like having different laziness for Prio
and non-Prio
queues.
Can you explain why the new representation necessitates fully lazy maps? Why can't we just keep the old implementation?
I'm a bit worried that fully lazy maps might lead to situations where you build up a lot of computations which slow down a subsequent deleteMin
, or will that not happen?
It doesn't necessitate fully lazy maps. Keeping them strict in the spine is easy, but not very useful for predictable performance. Keeping them entirely strict in the structure is quite possible too, but it's slightly trickier. That said, I just realized it should be somewhat less tricky with the Nattish
maps than the old ones. We just need to stop the strict mapping at Succy Zeroy
, and be lazy at Zeroy
. It's still more code, but should be pretty readable. I'll give it a whirl and see how that goes.
Keeping them strict in the spine is easy, but not very useful for predictable performance.
So why was that useful before, but not anymore?
The old representation had entirely strict trees, which held lazy values, so calculating the spine strictly calculated everything (except the values) strictly. The new representation uses the same fields for values as for lists of trees, so those fields must be lazy. To calculate everything but the values strictly, we must calculate strictly down to Succy Zeroy
, then calculate lazily. I suspect this won't be a big deal. With the old auxiliary class approach, it would've required an extra "layer" of computational structure, but I don't think we'll need that now.
@konsumlamm I think maps should be back to the way they were now.
Sorry, I double confused myself. The allocation reduction is as big as I thought.
Store the value associated with each key as its rightmost child, which saves one word per element.
As a result, the binomial trees must become lazy, which should be good for maps and lazy traversals. The down side is that we will need tag checks to know that we have realized
Succ
constructors.Benchmarks indicate this improves performance.
Closes #115.