Open idontgetoutmuch opened 1 month ago
Random Fu | MWC Random |
---|---|
Bernoulli | bernoulli |
Beta | beta |
Binomial | |
Categorical | categorical |
ChiSquare | chiSquare |
Dirichlet | dirichlet |
Exponential | exponential |
StretchedExponential | |
truncatedExp | |
geometric | |
Gamma | gamma |
Multinomial | |
Normal | normal |
Pareto | |
Poisson | |
Rayleigh | |
Simplex | |
T | |
Triangular | |
Uniform | |
Weibull | |
Ziggurat |
With random and the distributions in mwc-random and random-fu we can come up with something better and even comparable to what is available in R. The table above shows the samplers that are available. In R
one also has the CDFs, PDFs and PMFs. I have tried to include this in random-fu
. Conceivably, we could split off the distributions from mwc-random
and the the CDFs etc from random-fu
and statistics
and create a new R
-like package. I've also started comparing various implementations so for example
{-# LANGUAGE ImportQualifiedPost #-}
{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}
import Data.Random qualified as RF
import Data.Random.Distribution.Binomial qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
( Benchmark,
bench,
defaultConfig,
defaultMainWith,
nfIO,
)
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (liftM, replicateM)
binomialFu :: IO Int
binomialFu = do g <- newIOGenM (mkStdGen 1729)
x <- liftM sum $ replicateM 1000 $ RF.runRVar (RF.binomial 1400 (0.4 :: Double)) g
return x
binomialMwc :: IO Int
binomialMwc = do g <- newIOGenM (mkStdGen 1729)
x <- liftM sum $ replicateM 1000 $ MWC.binomial 1400 0.4 g
return x
normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"
rawDAT :: FilePath
rawDAT = "raw.dat"
normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Binomial fu" $ nfIO binomialFu
, bench "Binomial mwc" $ nfIO binomialMwc
]
main :: IO ()
main = do
let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
defaultMainWith configNormal normalBenchmarks
which gives
benchmarking Binomial fu
time 5.706 ms (5.680 ms .. 5.730 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 5.746 ms (5.739 ms .. 5.762 ms)
std dev 28.68 μs (13.47 μs .. 58.35 μs)
benchmarking Binomial mwc
time 268.7 μs (267.9 μs .. 269.9 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 267.7 μs (267.3 μs .. 268.5 μs)
std dev 1.852 μs (1.069 μs .. 3.189 μs)
So we know the new binomial sampler in mwc-random
is better (CAVEAT: for one given set of parameters) by about x20 (and that is without "squeezing").
@Shimuuar I think this would be better as a discussion rather than an issue. I think you as admin for the repo on GitHub have to do something: https://docs.github.com/en/discussions/quickstart
Yes new package could be useful. It would give chance to rethink API. statistics
take on distribution I think isn't very successful
Discussions could be nice but I don't have admin rights, only commit rights