To measure performance of someFunc Haskell benchmarks often employ code like this:
map someFunc [1..1000] `deepseq` ()
Here we want to measure time of 1000 applications of someFunc, and in order to do so we deepseq the entire list map someFunc [1..1000]. However, in this scenario we are not really interested in materializing the list: if someFunc is relatively fast, allocations are likely to skew measurements. See https://github.com/Bodigrim/tasty-bench/issues/48#issuecomment-1606049088 for discussion and various hacky workarounds.
We can do better by making instance NFData [a] able to fuse by rewriting it via foldr. This has a nice side effect of avoiding manual recursion and having a more concise definition as well.
Before the patch
foo :: Int -> ()
foo n = [1..n] `deepseq` ()
compiles to
Rec {
$wgo3 :: [Int] -> (# #)
$wgo3
= \ (ds_s8hJ :: [Int]) ->
case ds_s8hJ of {
[] -> (##);
: y_i8gZ ys_i8h0 -> case y_i8gZ of { I# ipv_i8gR -> $wgo3 ys_i8h0 }
}
end Rec }
$wfoo :: Int# -> (# #)
$wfoo
= \ (ww_s8hT :: Int#) ->
case ># 1# ww_s8hT of {
__DEFAULT ->
letrec {
go3_a8hF :: Int# -> [Int]
go3_a8hF
= \ (x_a8hG :: Int#) ->
: (I# x_a8hG)
(case ==# x_a8hG ww_s8hT of {
__DEFAULT -> go3_a8hF (+# x_a8hG 1#);
1# -> []
}); } in
$wgo3 (go3_a8hF 1#);
1# -> (##)
}
foo :: Int -> ()
foo
= \ (n_s8hR :: Int) ->
case n_s8hR of { I# ww_s8hT ->
case $wfoo ww_s8hT of { (# #) -> () }
}
Here one can observe $wgo3 which is forcing a list (essentially this is rnf from instance NFData [a]) and go3_a8hF which produces it. Such code allocates boxed Int and (:) constructors.
After the patch:
foo :: Int -> ()
foo
= \ (n_a8g8 :: Int) ->
case n_a8g8 of { I# y_a8ha ->
case ># 1# y_a8ha of {
__DEFAULT ->
joinrec {
go3_a5N1 :: Int# -> ()
go3_a5N1 (x_a5N2 :: Int#)
= case ==# x_a5N2 y_a8ha of {
__DEFAULT -> jump go3_a5N1 (+# x_a5N2 1#);
1# -> ()
}; } in
jump go3_a5N1 1#;
1# -> ()
}
}
Here we do force evaluation of all elements of the list, but do not allocate them.
Fixes #76, CC @treeowl.
To measure performance of
someFunc
Haskell benchmarks often employ code like this:Here we want to measure time of 1000 applications of
someFunc
, and in order to do so wedeepseq
the entire listmap someFunc [1..1000]
. However, in this scenario we are not really interested in materializing the list: ifsomeFunc
is relatively fast, allocations are likely to skew measurements. See https://github.com/Bodigrim/tasty-bench/issues/48#issuecomment-1606049088 for discussion and various hacky workarounds.We can do better by making
instance NFData [a]
able to fuse by rewriting it viafoldr
. This has a nice side effect of avoiding manual recursion and having a more concise definition as well.Before the patch
compiles to
Here one can observe
$wgo3
which is forcing a list (essentially this isrnf
frominstance NFData [a]
) andgo3_a8hF
which produces it. Such code allocates boxedInt
and(:)
constructors.After the patch:
Here we do force evaluation of all elements of the list, but do not allocate them.