Open treeowl opened 2 years ago
Can you elaborate a bit? What dictionaries exactly are you talking about and why could they get unnecessarily large?
I need to review the code to see if there's actually an issue or not. Instances for non-regular types are often recursive. It's possible, for example, to have foldMap
for T t
invoke foldMap
for T (Succ T)
. This means that each step in the recursion potentially risks building a whole Foldable
dictionary just to pluck out its foldMap
. Where that's an issue, it can be solved by either passing a function (instead of using a class constraint) or using a custom single-method class. Again: I don't know if we have an issue here; I just want to check.
I believe that the unordered folds and traversals actually have this problem, though I haven't pored over the Core to be sure. We can fix it easily using some extra classes like FoldrU
and FoldlU
.
Confirmed. When I compile with -ddump-simpl
and dig into the guts of foldrU
, it eventually leads me to foldr
for BinomForest
. Cleaned up just a little bit, it looks like this:
Data.PQueue.Internals.$fFoldableBinomForest_$cfoldr
:: forall (rk :: * -> *).
Foldable rk =>
forall a b. (a -> b -> b) -> b -> BinomForest rk a -> b
Data.PQueue.Internals.$fFoldableBinomForest_$cfoldr
= \ (@ (rk_XpVJ :: * -> *))
($dFoldable1_XpVL :: Foldable rk_XpVJ)
(@ a_apR9)
(@ b_apRa) ->
let {
$dFoldable2_sqPe :: Foldable (Succ rk_XpVJ)
$dFoldable2_sqPe
= Data.PQueue.Internals.$fFoldableSucc
@ rk_XpVJ $dFoldable1_XpVL } in
let {
lvl20_ssZO
:: (a_apR9 -> b_apRa -> b_apRa)
-> b_apRa -> BinomForest (Succ rk_XpVJ) a_apR9 -> b_apRa
lvl20_ssZO
= Data.PQueue.Internals.$fFoldableBinomForest_$cfoldr
@ (Succ rk_XpVJ) $dFoldable2_sqPe @ a_apR9 @ b_apRa } in
\ (ds_dqm2 :: a_apR9 -> b_apRa -> b_apRa)
(z_aoQJ :: b_apRa)
(ds1_dqm3 :: BinomForest rk_XpVJ a_apR9) ->
case ds1_dqm3 of {
Nil -> z_aoQJ;
Skip tss_aoQM -> lvl20_ssZO ds_dqm2 z_aoQJ tss_aoQM;
Cons dt_dqoQ dt1_dqoR tss_aoQQ ->
ds_dqm2
dt_dqoQ
(foldr
@ rk_XpVJ
$dFoldable1_XpVL
@ a_apR9
@ b_apRa
ds_dqm2
(lvl20_ssZO ds_dqm2 z_aoQJ tss_aoQQ)
dt1_dqoR)
}
end Rec }
The problem is that first let
expression. Each recursive call constructs a full Foldable
dictionary, with all its many methods, to pass on. That's 100% a waste of time and allocation, so we definitely don't want that.
@treeowl has this been fixed or is there anything left to do? As far as I can tell, all the folds now use the classes from Data.PQueue.Internals.Foldable
, which only have a single method each.
@treeowl just a reminder, could you check if there's anything left to do here?
Yes, I'll try. You're also welcome to give it a shot if you're faster.
We should check the Core for unordered traversals and (therefore) maps. Currently, NFData
only has one method, so that should be fine for now—though I'm starting to push to add a second one. I don't remember any other such situations.
I'm not seeing any further problems of this shape at the moment, per se. However, I wonder if we'd be better off caching the constructed closures. Something like
data FoldMaps m rk k a = FM
{ foo :: BinomTree rk k a -> m
, bar :: BinomForest rk k a -> m
, baz :: rk k a -> m
, more :: FoldMaps m (Succ rk) k a
}
The idea is to cut the number of closures from $O(n)$ to $O(\log n)$. This is not a new idea, nor my own, but I don't know how much it's been explored in this context.
A completely different approach would be to reimagine these operations from the ground up, in terms of repeated decrement. Decrement is a bit like minView(WithKey)
, but with the dramatically simplifying (and very dramatically optimizing) assumption that the first binomial tree holds the minimum. Allocations with this approach are likely at least as high as in the current code, but it should avoid the jumps to unknown addresses.
Things like traversals and (especially) folds can easily get out of hand with nonregular types. We should make sure they don't.