Open idontgetoutmuch opened 9 months ago
So with random-fu
I get
benchmarking Normal single sample monad bayes
time 171.3 μs (171.1 μs .. 171.6 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 171.8 μs (171.5 μs .. 172.2 μs)
std dev 1.028 μs (566.3 ns .. 1.746 μs)
but with mwc-random
I get
benchmarking Normal single sample monad bayes
time 304.9 μs (304.3 μs .. 305.5 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 305.1 μs (304.7 μs .. 306.0 μs)
std dev 2.089 μs (1.312 μs .. 3.365 μs)
@Shimuuar I don't know why this would be. The implementation in random-fu
is a bit opaque and uses a different algorithm (not the Doornik modified ziggurat method as far as I can see). I will try other distributions. This is just for information. No need to do anything.
Can you, instead of editing in place, copy the file Sampler/Strict.hs
to Sampler/StrictFu.hs
maybe? And then you have two sampler types, both of which you can import in the benchmark, and test against each other?
Well this is mysterious. When I run both benchmarks in one program I get
cabal build speed-bench
speed-bench --match prefix Normal
Removing: speed-samples.csv
Removing: raw.dat
benchmarking Normal single sample monad bayes
time 21.92 μs (21.85 μs .. 22.02 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 21.94 μs (21.90 μs .. 22.03 μs)
std dev 183.7 ns (126.4 ns .. 300.8 ns)
benchmarking Normal single sample monad bayes fu
time 112.4 μs (112.3 μs .. 112.5 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 112.4 μs (112.3 μs .. 112.6 μs)
std dev 412.1 ns (273.1 ns .. 674.6 ns)
I wonder what the benchmark is actually measuring.
@turion I didn't try your suggestions yet and went back to basics. I have
{-# LANGUAGE ImportQualifiedPost #-}
{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}
import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
( Benchmark,
bench,
defaultConfig,
defaultMainWith,
nfIO,
)
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)
normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0
normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"
rawDAT :: FilePath
rawDAT = "raw.dat"
normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
xs <- replicateM 1000 normalFu
return $ sum xs
, bench "Normal mwc" $ nfIO $ do
xs <- replicateM 1000 normalMwc
return $ sum xs
]
main :: IO ()
main = do
let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
defaultMainWith configNormal normalBenchmarks
and get
benchmarking Normal fu
time 172.2 μs (172.1 μs .. 172.4 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 172.6 μs (172.5 μs .. 173.1 μs)
std dev 828.2 ns (175.5 ns .. 1.717 μs)
benchmarking Normal mwc
time 294.5 μs (294.1 μs .. 295.2 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 294.6 μs (294.3 μs .. 295.1 μs)
std dev 1.295 μs (856.7 ns .. 1.894 μs)
So now I need to add something like a MonadDistribution
class (which looks like it is reversing the performance results somehow).
{-# LANGUAGE ImportQualifiedPost #-}
{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}
import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
( Benchmark,
bench,
defaultConfig,
defaultMainWith,
nfIO,
)
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)
normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)
normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0
-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random
-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
-- | Draw from a uniform distribution.
random ::
-- | \(\sim \mathcal{U}(0, 1)\)
m Double
-- | Draw from a normal distribution.
normal ::
-- | mean μ
Double ->
-- | standard deviation σ
Double ->
-- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
m Double
normal m s = draw (normalDistr m s)
-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO
instance StatefulGen g m => MonadDistribution (SamplerT g m) where
random = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
normal m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
instance StatefulGen g m => MonadDistribution (SamplerU g m) where
random = SamplerU (ReaderT uniformDouble01M)
normal m s = SamplerU (ReaderT (MWC.normal m s))
-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x
sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x
-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m
sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m
normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"
rawDAT :: FilePath
rawDAT = "raw.dat"
normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
xs <- replicateM 1000 normalFu
return $ sum xs
, bench "Normal mwc" $ nfIO $ do
xs <- replicateM 1000 normalMwc
return $ sum xs
, bench "Normal fu via MonadDistribution" $ nfIO $ do
sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
, bench "Normal mwc via MonadDistribution" $ nfIO $ do
sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
]
main :: IO ()
main = do
let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
defaultMainWith configNormal normalBenchmarks
gives this
./FuVsMwc
benchmarking Normal fu
time 170.5 μs (170.3 μs .. 170.7 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 170.8 μs (170.7 μs .. 171.0 μs)
std dev 532.9 ns (350.6 ns .. 729.4 ns)
benchmarking Normal mwc
time 294.1 μs (293.9 μs .. 294.5 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 294.5 μs (294.2 μs .. 295.1 μs)
std dev 1.335 μs (754.2 ns .. 2.170 μs)
benchmarking Normal fu via MonadDistribution
time 173.3 μs (173.0 μs .. 173.8 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 173.6 μs (173.2 μs .. 174.1 μs)
std dev 1.425 μs (902.5 ns .. 2.208 μs)
benchmarking Normal mwc via MonadDistribution
time 302.1 μs (301.8 μs .. 302.6 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 303.1 μs (302.5 μs .. 304.7 μs)
std dev 3.053 μs (1.805 μs .. 5.216 μs)
So it seems adding the type class adds almost no overhead and the results remain consistent (with random-fu being faster than mwx for normal samples).
If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.
The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.
Again, if you add INLINE
and SPECIALISE
pragmas, you should be able to get rid of the type class overhead again.
I did not check my statements against Core, so one should maybe look into the optimized core first to make sure that this is what's happening.
If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.
The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.
Again, if you add
INLINE
andSPECIALISE
pragmas, you should be able to get rid of the type class overhead again.
I am sure you are right but the discrepancy seems to be caused by using ghc -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs
and cabal build speed-bench
.
WIth ghc -O1 -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs
I get consistency and mwc
beats random-fu
. Now maybe is the time to specialise and inline?
I tried
instance StatefulGen g m => MonadDistribution (SamplerT g m) where
random = uniform
normal m s = normalSpec m s
uniform :: (StatefulGen g m, RF.Distribution RF.StdUniform a) => SamplerT g m a
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}
normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}
but it made no difference.
{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}
import Data.Random qualified as RF
import Data.Random.Distribution.Uniform qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
( Benchmark,
bench,
defaultConfig,
defaultMainWith,
nfIO,
)
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)
normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)
normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0
-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random
-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
-- | Draw from a uniform distribution.
random ::
-- | \(\sim \mathcal{U}(0, 1)\)
m Double
-- | Draw from a normal distribution.
normal ::
-- | mean μ
Double ->
-- | standard deviation σ
Double ->
-- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
m Double
normal m s = draw (normalDistr m s)
-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO
instance StatefulGen g m => MonadDistribution (SamplerT g m) where
random = uniform
normal m s = normalSpec m s
uniform :: StatefulGen g m => SamplerT g m Double
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.doubleStdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}
normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}
instance StatefulGen g m => MonadDistribution (SamplerU g m) where
random = SamplerU (ReaderT uniformDouble01M)
normal m s = SamplerU (ReaderT (MWC.normal m s))
-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x
sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x
-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m
sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m
normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"
rawDAT :: FilePath
rawDAT = "raw.dat"
normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
xs <- replicateM 1000 normalFu
return $ sum xs
, bench "Normal mwc" $ nfIO $ do
xs <- replicateM 1000 normalMwc
return $ sum xs
, bench "Normal fu via MonadDistribution" $ nfIO $ do
sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
, bench "Normal mwc via MonadDistribution" $ nfIO $ do
sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
, bench "Uniform fu via MonadDistribution" $ nfIO $ do
sampleIOfixed (do xs <- replicateM 1000 $ random
return $ sum xs)
, bench "Uniform mwc via MonadDistribution" $ nfIO $ do
sampleIOfixee (do xs <- replicateM 1000 $ random
return $ sum xs)
]
main :: IO ()
main = do
let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
defaultMainWith configNormal normalBenchmarks
gives
naive-standalone
benchmarking Normal fu
time 123.8 μs (123.4 μs .. 124.2 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 123.8 μs (123.5 μs .. 124.2 μs)
std dev 1.272 μs (1.082 μs .. 1.524 μs)
benchmarking Normal mwc
time 22.09 μs (21.81 μs .. 22.49 μs)
0.998 R² (0.998 R² .. 0.999 R²)
mean 21.98 μs (21.82 μs .. 22.23 μs)
std dev 664.2 ns (500.6 ns .. 816.0 ns)
variance introduced by outliers: 33% (moderately inflated)
benchmarking Normal fu via MonadDistribution
time 121.6 μs (119.5 μs .. 123.2 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 117.3 μs (116.0 μs .. 118.8 μs)
std dev 4.790 μs (4.188 μs .. 5.198 μs)
variance introduced by outliers: 41% (moderately inflated)
benchmarking Normal mwc via MonadDistribution
time 23.91 μs (23.54 μs .. 24.15 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 23.48 μs (23.22 μs .. 23.74 μs)
std dev 869.4 ns (832.5 ns .. 915.3 ns)
variance introduced by outliers: 43% (moderately inflated)
benchmarking Uniform fu via MonadDistribution
time 95.03 μs (94.54 μs .. 95.31 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 91.65 μs (90.58 μs .. 92.75 μs)
std dev 3.602 μs (3.424 μs .. 3.777 μs)
variance introduced by outliers: 41% (moderately inflated)
benchmarking Uniform mwc via MonadDistribution
time 9.928 μs (9.809 μs .. 10.08 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 10.02 μs (9.922 μs .. 10.12 μs)
std dev 346.9 ns (316.6 ns .. 364.0 ns)
variance introduced by outliers: 42% (moderately inflated)
Annoyingly I have had to comment out some of the existing benchmarks. I tried
but if I uncomment the other tests then
even though