A benchmark of random-fu (replacing mwc-random)

idontgetoutmuch commented 9 months ago

Annoyingly I have had to comment out some of the existing benchmarks. I tried

./Speed-Fu --match prefix Normal
benchmarking Normal single sample monad bayes
time                 172.3 μs   (172.1 μs .. 172.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 172.8 μs   (172.6 μs .. 173.1 μs)
std dev              807.4 ns   (512.3 ns .. 1.369 μs)

but if I uncomment the other tests then

./Speed-Fu --match prefix Normal
Error: none of the specified names matches a benchmark

even though

./Speed-Fu --list | grep Normal
Normal single sample monad bayes

idontgetoutmuch commented 9 months ago

So with random-fu I get

benchmarking Normal single sample monad bayes
time                 171.3 μs   (171.1 μs .. 171.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 171.8 μs   (171.5 μs .. 172.2 μs)
std dev              1.028 μs   (566.3 ns .. 1.746 μs)

but with mwc-random I get

benchmarking Normal single sample monad bayes
time                 304.9 μs   (304.3 μs .. 305.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 305.1 μs   (304.7 μs .. 306.0 μs)
std dev              2.089 μs   (1.312 μs .. 3.365 μs)

@Shimuuar I don't know why this would be. The implementation in random-fu is a bit opaque and uses a different algorithm (not the Doornik modified ziggurat method as far as I can see). I will try other distributions. This is just for information. No need to do anything.

turion commented 9 months ago

Can you, instead of editing in place, copy the file Sampler/Strict.hs to Sampler/StrictFu.hs maybe? And then you have two sampler types, both of which you can import in the benchmark, and test against each other?

idontgetoutmuch commented 9 months ago

Well this is mysterious. When I run both benchmarks in one program I get

cabal build speed-bench
speed-bench --match prefix Normal

Removing: speed-samples.csv
Removing: raw.dat
benchmarking Normal single sample monad bayes
time                 21.92 μs   (21.85 μs .. 22.02 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 21.94 μs   (21.90 μs .. 22.03 μs)
std dev              183.7 ns   (126.4 ns .. 300.8 ns)

benchmarking Normal single sample monad bayes fu
time                 112.4 μs   (112.3 μs .. 112.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 112.4 μs   (112.3 μs .. 112.6 μs)
std dev              412.1 ns   (273.1 ns .. 674.6 ns)

I wonder what the benchmark is actually measuring.

idontgetoutmuch commented 8 months ago

@turion I didn't try your suggestions yet and went back to basics. I have

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)

normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

and get

benchmarking Normal fu
time                 172.2 μs   (172.1 μs .. 172.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 172.6 μs   (172.5 μs .. 173.1 μs)
std dev              828.2 ns   (175.5 ns .. 1.717 μs)

benchmarking Normal mwc
time                 294.5 μs   (294.1 μs .. 295.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 294.6 μs   (294.3 μs .. 295.1 μs)
std dev              1.295 μs   (856.7 ns .. 1.894 μs)

So now I need to add something like a MonadDistribution class (which looks like it is reversing the performance results somehow).

idontgetoutmuch commented 8 months ago

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)

normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random

-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
    -- | Draw from a uniform distribution.
  random ::
    -- | \(\sim \mathcal{U}(0, 1)\)
    m Double

  -- | Draw from a normal distribution.
  normal ::
    -- | mean μ
    Double ->
    -- | standard deviation σ
    Double ->
    -- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
    m Double
  normal m s = draw (normalDistr m s)

-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)

-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
  normal m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)

instance StatefulGen g m => MonadDistribution (SamplerU g m) where
  random = SamplerU (ReaderT uniformDouble01M)
  normal m s = SamplerU (ReaderT (MWC.normal m s))

-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x

sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x

-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m

sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   , bench "Normal fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Normal mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

gives this

./FuVsMwc 
benchmarking Normal fu
time                 170.5 μs   (170.3 μs .. 170.7 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 170.8 μs   (170.7 μs .. 171.0 μs)
std dev              532.9 ns   (350.6 ns .. 729.4 ns)

benchmarking Normal mwc
time                 294.1 μs   (293.9 μs .. 294.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 294.5 μs   (294.2 μs .. 295.1 μs)
std dev              1.335 μs   (754.2 ns .. 2.170 μs)

benchmarking Normal fu via MonadDistribution
time                 173.3 μs   (173.0 μs .. 173.8 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 173.6 μs   (173.2 μs .. 174.1 μs)
std dev              1.425 μs   (902.5 ns .. 2.208 μs)

benchmarking Normal mwc via MonadDistribution
time                 302.1 μs   (301.8 μs .. 302.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 303.1 μs   (302.5 μs .. 304.7 μs)
std dev              3.053 μs   (1.805 μs .. 5.216 μs)

So it seems adding the type class adds almost no overhead and the results remain consistent (with random-fu being faster than mwx for normal samples).

turion commented 8 months ago

If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.

The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.

Again, if you add INLINE and SPECIALISE pragmas, you should be able to get rid of the type class overhead again.

turion commented 8 months ago

I did not check my statements against Core, so one should maybe look into the optimized core first to make sure that this is what's happening.

idontgetoutmuch commented 8 months ago

If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.

The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.

Again, if you add INLINE and SPECIALISE pragmas, you should be able to get rid of the type class overhead again.

I am sure you are right but the discrepancy seems to be caused by using ghc -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs and cabal build speed-bench.

idontgetoutmuch commented 8 months ago

WIth ghc -O1 -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs I get consistency and mwc beats random-fu. Now maybe is the time to specialise and inline?

idontgetoutmuch commented 8 months ago

I tried

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = uniform
  normal m s = normalSpec m s

uniform :: (StatefulGen g m, RF.Distribution RF.StdUniform a) => SamplerT g m a
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}

normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}

but it made no difference.

idontgetoutmuch commented 8 months ago

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import Data.Random.Distribution.Uniform qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)

normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random

-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
    -- | Draw from a uniform distribution.
  random ::
    -- | \(\sim \mathcal{U}(0, 1)\)
    m Double

  -- | Draw from a normal distribution.
  normal ::
    -- | mean μ
    Double ->
    -- | standard deviation σ
    Double ->
    -- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
    m Double
  normal m s = draw (normalDistr m s)

-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)

-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = uniform
  normal m s = normalSpec m s

uniform :: StatefulGen g m => SamplerT g m Double
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.doubleStdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}

normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}

instance StatefulGen g m => MonadDistribution (SamplerU g m) where
  random = SamplerU (ReaderT uniformDouble01M)
  normal m s = SamplerU (ReaderT (MWC.normal m s))

-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x

sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x

-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m

sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   , bench "Normal fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Normal mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Uniform fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ random
                                         return $ sum xs)
                   , bench "Uniform mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ random
                                         return $ sum xs)
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

gives

naive-standalone
benchmarking Normal fu
time                 123.8 μs   (123.4 μs .. 124.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 123.8 μs   (123.5 μs .. 124.2 μs)
std dev              1.272 μs   (1.082 μs .. 1.524 μs)

benchmarking Normal mwc
time                 22.09 μs   (21.81 μs .. 22.49 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 21.98 μs   (21.82 μs .. 22.23 μs)
std dev              664.2 ns   (500.6 ns .. 816.0 ns)
variance introduced by outliers: 33% (moderately inflated)

benchmarking Normal fu via MonadDistribution
time                 121.6 μs   (119.5 μs .. 123.2 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 117.3 μs   (116.0 μs .. 118.8 μs)
std dev              4.790 μs   (4.188 μs .. 5.198 μs)
variance introduced by outliers: 41% (moderately inflated)

benchmarking Normal mwc via MonadDistribution
time                 23.91 μs   (23.54 μs .. 24.15 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 23.48 μs   (23.22 μs .. 23.74 μs)
std dev              869.4 ns   (832.5 ns .. 915.3 ns)
variance introduced by outliers: 43% (moderately inflated)

benchmarking Uniform fu via MonadDistribution
time                 95.03 μs   (94.54 μs .. 95.31 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 91.65 μs   (90.58 μs .. 92.75 μs)
std dev              3.602 μs   (3.424 μs .. 3.777 μs)
variance introduced by outliers: 41% (moderately inflated)

benchmarking Uniform mwc via MonadDistribution
time                 9.928 μs   (9.809 μs .. 10.08 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 10.02 μs   (9.922 μs .. 10.12 μs)
std dev              346.9 ns   (316.6 ns .. 364.0 ns)
variance introduced by outliers: 42% (moderately inflated)

tweag / monad-bayes

A benchmark of random-fu (replacing mwc-random) #323