Closed chessai closed 6 years ago
Note that we are actually using Data.List.sum
, which is defined as foldl (+) 0
. It uses a lazy left-fold, but the strictness analyser should be able to determine in most cases that we are consuming the list strictly and avoid the need to build up thunks. However, I am really confused as to why we are faster for smaller lists and slower for larger ones. It might have to do with this strictness analysis kicking in?
Ah, I figured it out. I was in the right direction. Looking at the core for the 'sum' function I defined, it looks like this:
-- RHS size: {terms: 25, types: 24, coercions: 8, joins: 1/1}
sum1_r96e
sum1_r96e
= \ @ a_a4MN $dNum_a4MP eta_B1 ->
joinrec {
go_a6mW
go_a6mW ds_a6mX eta1_X40
= case ds_a6mX of {
[] -> eta1_X40;
: y_a6n2 ys_a6n3 ->
case eta1_X40 `cast` <Co:2> of nt_s6nQ { __DEFAULT ->
jump go_a6mW ys_a6n3 ((+ $dNum_a4MP nt_s6nQ y_a6n2) `cast` <Co:3>)
}
}; } in
jump go_a6mW
eta_B1 ((fromInteger $dNum_a4MP $fMonoidSum1) `cast` <Co:3>)
Specifically, the line where it says:
jump go_a6mW ys_a6n3 ((+ $dNum_a4MP nt_s6nQ y_a6n2) `cast` <Co:3>)
This is building up thunks; in other words, it's lazy in the accumulator. the definition of Data.List.sum
allows GHC to easily make strictness optimisations with the left-fold, so it has predictable performance. GHC is not able to perform the same strictness optimisations for our implementation with foldMap, perhaps because it cannot make the same guarantees (I don't know why.)
We can fix this with a strict definition of foldMap
, like so:
foldMap' :: (Foldable t, Monoid m) => (a -> m) -> t a -> m
{-# INLINE foldMap' #-}
foldMap' f = Foldable.foldl' (\acc x -> acc `mappend` f x) mempty
Also, i changed the definition of 'sum' to be polymorphic in 'Num', to match the behaviour of Foldable precisely. This really shouldn't make a difference, but it wasn't hard to do and is strictly correct.
Fixed the benchmarks. It seems that for lazy sums, we do better for smaller lists (the size cutoff seems to be between 10^3 - 10^4, consistently.)
For strict sums, we always perform better by a noticeable (but not overwhelmingly large) margin.
Note: Although we perform better in this respect, Data.List.sum might be more amenable to list fusion.
Thanks. I'm not worrying too much about the performance differences between ala Sum foldMap
and sum
– this seems to be mostly about foldMap
vs sum
, not so much about ala
.
But I do wonder if we should give so many examples with foldMap
– maybe we should use foldMap'
there and add a note regarding the performance implications.
It looks like we do better for sufficiently small lists. Somewhere between the size of 10^3 and 10^4 we start losing out.