Closed jberryman closed 8 years ago
Indeed, 2ns doesn't make any sense. I'm quite confused about why that benchmark can run in 2ns, so I'll spend some time trying to understand what's going on here. I'm clearly generating a strict ByteString and even forcing it. Weird.
So I wrote this small file as a test:
{-# LANGUAGE DeriveDataTypeable, FlexibleContexts, FlexibleInstances,
LambdaCase, MultiParamTypeClasses #-}
module Main where
import Criterion.Main
import Lib
runBench :: IO ()
runBench = do
defaultMain
[ env (generateBalancedTree 22) $ \tree ->
bgroup "serialization"
[
-- bench "binary" $ nf (serialize Binary) tree,
bench "binary-io" $ nfIO (return (serialize Binary tree))
]
]
main :: IO ()
main = runBench
This is the related STG:
Main.$wrunBench
:: Lib.BinTree GHC.Types.Int
-> (# GHC.Base.String, [Criterion.Types.Benchmark] #) =
\r srt:SRT:[rA :-> Data.ByteString.Builder.toLazyByteString,
rB :-> Data.ByteString.Lazy.toStrict,
rP9 :-> Data.Binary.Class.$fBinaryInt, r2R3 :-> Main.main5,
r2R6 :-> Main.main6] [w_sfvK]
let {
lvl3_sfvL :: () =
\u srt:SRT:[rA :-> Data.ByteString.Builder.toLazyByteString,
rB :-> Data.ByteString.Lazy.toStrict,
rP9 :-> Data.Binary.Class.$fBinaryInt] []
let {
sat_sfvP :: Data.ByteString.Builder.Internal.Builder =
\u srt:SRT:[rP9 :-> Data.Binary.Class.$fBinaryInt] []
case Lib.$w$cput Data.Binary.Class.$fBinaryInt w_sfvK of _ {
(#,#) _ ww2_sfvO -> ww2_sfvO;
};
} in
case
Data.ByteString.Builder.toLazyByteString sat_sfvP
of
sat_sfvQ
{ __DEFAULT ->
case Data.ByteString.Lazy.toStrict sat_sfvQ of _ {
Data.ByteString.Internal.PS _ _ _ _ -> () [];
};
}; } in
let {
$wgo12_sfvW
:: GHC.Prim.Int#
-> GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) =
sat-only \r srt:SRT:[] [ww_sfvX w1_sfvY]
case <=# [ww_sfvX 0#] of sat_sfvZ {
__DEFAULT ->
case tagToEnum# [sat_sfvZ] of _ {
GHC.Types.False ->
case seq# [lvl3_sfvL w1_sfvY] of _ {
(#,#) ipv_sfw2 _ ->
case -# [ww_sfvX 1#] of sat_sfw4 {
__DEFAULT -> $wgo12_sfvW sat_sfw4 ipv_sfw2;
};
};
GHC.Types.True -> (#,#) [w1_sfvY GHC.Tuple.()];
};
}; } in
let {
sat_sfw9 :: Criterion.Types.Benchmarkable =
\r srt:SRT:[] [eta1_sfw5 eta2_sfw6]
case eta1_sfw5 of _ {
GHC.Int.I64# ww1_sfw8 -> $wgo12_sfvW ww1_sfw8 eta2_sfw6;
}; } in
let {
sat_sfwa :: Criterion.Types.Benchmark =
NO_CCS Criterion.Types.Benchmark! [Main.main5 sat_sfw9]; } in
let {
sat_sfwb :: [Criterion.Types.Benchmark] =
NO_CCS :! [sat_sfwa GHC.Types.[]];
} in (#,#) [Main.main6 sat_sfwb];
Important part is, this is the part that serializes the tree:
let {
lvl3_sfvL :: () =
\u srt:SRT:[rA :-> Data.ByteString.Builder.toLazyByteString,
rB :-> Data.ByteString.Lazy.toStrict,
rP9 :-> Data.Binary.Class.$fBinaryInt] []
let {
sat_sfvP :: Data.ByteString.Builder.Internal.Builder =
\u srt:SRT:[rP9 :-> Data.Binary.Class.$fBinaryInt] []
case Lib.$w$cput Data.Binary.Class.$fBinaryInt w_sfvK of _ {
(#,#) _ ww2_sfvO -> ww2_sfvO;
};
} in
case
Data.ByteString.Builder.toLazyByteString sat_sfvP
of
sat_sfvQ
{ __DEFAULT ->
case Data.ByteString.Lazy.toStrict sat_sfvQ of _ {
Data.ByteString.Internal.PS _ _ _ _ -> () [];
};
}; } in
Note that this is a local definition, an updateable thunk. So while the benchmark loop that runs this multiple times actually run this multiple times ...
case seq# [lvl3_sfvL w1_sfvY] of _ {
(#,#) ipv_sfw2 _ ->
case -# [ww_sfvX 1#] of sat_sfw4 {
__DEFAULT -> $wgo12_sfvW sat_sfw4 ipv_sfw2;
};
... nothing really happens after the first iteration.
GHC is so good it makes very hard to benchmark Haskell programs sometimes.
This is Core of the same code:
let {
lvl3_sfuG :: ()
lvl3_sfuG =
case Data.ByteString.Lazy.toStrict
(Data.ByteString.Builder.toLazyByteString
(case Lib.$w$cput
@ Int binary-0.8.3.0:Data.Binary.Class.$fBinaryInt w_sftD
of _ { (# ww1_ienD, ww2_ienE #) ->
ww2_ienE
}))
of _
{ Data.ByteString.Internal.PS dt_ieuC dt1_ieuD dt2_ieuE dt3_ieuF ->
ghc-prim-0.5.0.0:GHC.Tuple.()
} } in
I know how to fix this and I think we need to patch criterion
to avoid this in the future... Patch will be coming this evening.
Thanks for reporting, btw.
We're not serializing anything in 2 ns ;)
I'm sorry I don't have a PR for you, as I didn't care about packman and ripped it out and added some of my own benchmarks. But this sort of thing should yield meaningful numbers:
Thanks for putting this together; it was very helpful!