haskell-numerics / random-fu

A suite of Haskell libraries for representing, manipulating, and sampling random variables
42 stars 21 forks source link

Categorical Performs > x10 worse than Matlab (Octave) #25

Open idontgetoutmuch opened 9 years ago

idontgetoutmuch commented 9 years ago
{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import Data.Random.Source.PureMT
import Data.Random
import Data.Random.Distribution.Categorical

import Control.Monad.State

testCategorical :: [Double] -> Int -> RVar [Int]
testCategorical ws n = replicateM (fromIntegral n) $
                       categorical (zip ws [0..(fromIntegral (length ws - 1))])

n :: Int
n = 1000000

main :: IO ()
main =
  print $
    sum $
    evalState (sample (testCategorical (replicate n (recip (fromIntegral n))) (fromIntegral n)))
              (pureMT 2)
~/Dropbox/Private/Multinomial $ ./Multinomial +RTS -s
500118181962
   2,145,547,624 bytes allocated in the heap
     625,841,120 bytes copied during GC
     103,975,464 bytes maximum residency (11 sample(s))
       3,574,920 bytes maximum slop
             259 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4082 colls,     0 par    0.44s    0.45s     0.0001s    0.0064s
  Gen  1        11 colls,     0 par    0.26s    0.37s     0.0334s    0.0879s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.07s  (  2.07s elapsed)
  GC      time    0.70s  (  0.81s elapsed)
  EXIT    time    0.00s  (  0.02s elapsed)
  Total   time    2.78s  (  2.90s elapsed)

  %GC     time      25.3%  (28.1% elapsed)

  Alloc rate    1,035,723,896 bytes per MUT second

  Productivity  74.7% of total user, 71.5% of total elapsed

The same in Octave

num_particles = 1000000;
likelihood = zeros(num_particles,1);
likelihood(:,1) = 1/num_particles;

tic ();
[ns,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);

s = sum(index);
toc ()
octave> num_particles = 1000000;
octave> likelihood = zeros(num_particles,1);
octave> likelihood(:,1) = 1/num_particles;
octave> 
octave> tic ();
octave> [ns,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);
octave> 
octave> s = sum(index);
octave> toc ()
Elapsed time is 0.247516 seconds.
idontgetoutmuch commented 9 years ago

So this might not be a problem just with Categorical.

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import Data.Random.Source.PureMT
import Data.Random
import Data.Random.Distribution.Categorical
import Data.Random.Distribution.T

import Control.Monad.State

testUniform :: MonadRandom m => Int -> m [Double]
testUniform n = replicateM (fromIntegral n) (sample stdUniform)

n :: Int
n = 10^7

us :: [Double]
us = evalState (testUniform n) (pureMT $ fromIntegral arbSeed)

mean :: Double
mean = sum us / fromIntegral n

main :: IO ()
main = do
  print mean
~/Dropbox/Private/Multinomial $ ./Multinomial +RTS -s
0.4999607889729769
   5,122,925,704 bytes allocated in the heap
       1,213,648 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10015 colls,     0 par    0.03s    0.03s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.15s  (  1.16s elapsed)
  GC      time    0.03s  (  0.03s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.18s  (  1.20s elapsed)

  %GC     time       2.3%  (2.8% elapsed)

  Alloc rate    4,452,445,320 bytes per MUT second

  Productivity  97.7% of total user, 96.1% of total elapsed

Here's the matlab equivalent.

num_particles = 10^7;

tic();
s = sum(rand(num_particles,1));
toc()
octave> num_particles = 10^7;
octave> tic();
octave> s = sum(rand(num_particles,1));
octave> toc()
Elapsed time is 0.149504 seconds.

According to http://octave.sourceforge.net/octave/function/rand.html, octave uses the Mersenne Twister with a period of 2^19937-1. I thought that was what random-fu also used. So my next step will be to look at the random number generator and see if that is causing the 10x slowdown.

idontgetoutmuch commented 9 years ago

I've spent a bit of time investigating mwc-random see: https://github.com/bos/mwc-random/issues/35 and I can generate uniform samples about x10 faster but with what appears to be a space leak.