haskell / mwc-random

A very fast Haskell library for generating high quality pseudo-random numbers.
http://hackage.haskell.org/package/mwc-random
BSD 2-Clause "Simplified" License
54 stars 25 forks source link

Improving Haskell's Statistical Story #93

Open idontgetoutmuch opened 1 month ago

idontgetoutmuch commented 1 month ago
Random Fu MWC Random
Bernoulli bernoulli
Beta beta
Binomial
Categorical categorical
ChiSquare chiSquare
Dirichlet dirichlet
Exponential exponential
StretchedExponential
truncatedExp
geometric
Gamma gamma
Multinomial
Normal normal
Pareto
Poisson
Rayleigh
Simplex
T
Triangular
Uniform
Weibull
Ziggurat
idontgetoutmuch commented 1 month ago

With random and the distributions in mwc-random and random-fu we can come up with something better and even comparable to what is available in R. The table above shows the samplers that are available. In R one also has the CDFs, PDFs and PMFs. I have tried to include this in random-fu. Conceivably, we could split off the distributions from mwc-random and the the CDFs etc from random-fu and statistics and create a new R-like package. I've also started comparing various implementations so for example

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import Data.Random.Distribution.Binomial qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (liftM, replicateM)

binomialFu :: IO Int
binomialFu = do g <- newIOGenM (mkStdGen 1729)
                x <- liftM sum $ replicateM 1000 $ RF.runRVar (RF.binomial 1400 (0.4 :: Double)) g
                return x

binomialMwc :: IO Int
binomialMwc = do g <- newIOGenM (mkStdGen 1729)
                 x <- liftM sum $ replicateM 1000 $ MWC.binomial 1400 0.4 g
                 return x

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Binomial fu" $ nfIO binomialFu
                   , bench "Binomial mwc" $ nfIO binomialMwc
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

which gives

benchmarking Binomial fu
time                 5.706 ms   (5.680 ms .. 5.730 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.746 ms   (5.739 ms .. 5.762 ms)
std dev              28.68 μs   (13.47 μs .. 58.35 μs)

benchmarking Binomial mwc
time                 268.7 μs   (267.9 μs .. 269.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 267.7 μs   (267.3 μs .. 268.5 μs)
std dev              1.852 μs   (1.069 μs .. 3.189 μs)

So we know the new binomial sampler in mwc-random is better (CAVEAT: for one given set of parameters) by about x20 (and that is without "squeezing").

idontgetoutmuch commented 1 month ago

@Shimuuar I think this would be better as a discussion rather than an issue. I think you as admin for the repo on GitHub have to do something: https://docs.github.com/en/discussions/quickstart

Shimuuar commented 1 month ago

Yes new package could be useful. It would give chance to rethink API. statistics take on distribution I think isn't very successful

Discussions could be nice but I don't have admin rights, only commit rights