lballabio / QuantLib

The QuantLib C++ library
http://quantlib.org
Other
5.26k stars 1.78k forks source link

Adding `ZigguratGaussianRng<RNG>` #1981

Closed ralfkonrad closed 4 months ago

ralfkonrad commented 4 months ago

The improved Ziggurat method to generate normal random samples is significantly faster than BoxMuller. Therefore, it is e.g. the default generator in rust-random for StandardNormal distributions.

As the underlying RNG needs to provide std::uint64_t nextInt64() const random numbers, currently it only works in combination with Xoshiro256StarStarUniformRng.

On my local machine I get the following benchmark values.

Windows CXX compiler MSVC 19.40.33808.0 approx. two times faster compared to BoxMuller with MersenneTwister:

Run on (16 X 2304 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       5.39 ns         5.47 ns    100000000
xoshiro256StarStarBoxMullerGaussian.next();          9.30 ns         9.21 ns     74666667
mersenneTwisterBoxMullerGaussian.next();             11.7 ns         11.2 ns     56000000

WSL Ubuntu 22.04 CXX compiler GNU 11.4.0 approx. four times faster compared to BoxMuller with MersenneTwister:

Run on (16 X 2304.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.27, 0.10, 0.27
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       2.82 ns         2.82 ns    245704472
xoshiro256StarStarBoxMullerGaussian.next();          9.27 ns         9.27 ns     74894426
mersenneTwisterBoxMullerGaussian.next();             11.3 ns         11.3 ns     61261372
lballabio commented 4 months ago

Thanks! Out of curiosity, how does it compare to our current default, InverseCumulativeRng<MersenneTwisterUniformRng, InverseCumulativeNormal>?

coveralls commented 4 months ago

Coverage Status

coverage: 72.551% (+0.008%) from 72.543% when pulling a4d5392c4e4ad5606cb0562e5462012d79a3c251 on ralfkonrad:feature/ZigguratGaussianRng into 02426c0f7ca68977ede5d98bb98380f5b1f772a5 on lballabio:master.

ralfkonrad commented 4 months ago

That's interesting, @lballabio.

On pure Windows, the current default is slightly faster (4.4ns to 5.4ns), on WSL Ubuntu slower (4.8ns to 2.8ns)

Windows CXX compiler MSVC 19.40.33808.0

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       5.39 ns         5.47 ns    100000000
xoshiro256StarStarBoxMullerGaussian.next();          9.30 ns         9.21 ns     74666667
mersenneTwisterBoxMullerGaussian.next();             11.7 ns         11.2 ns     56000000
inverseCumulativeRng.next();                         4.41 ns         4.35 ns    154482759

WSL Ubuntu 22.04 CXX compiler GNU 11.4.0

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       2.82 ns         2.82 ns    245704472
xoshiro256StarStarBoxMullerGaussian.next();          9.27 ns         9.27 ns     74894426
mersenneTwisterBoxMullerGaussian.next();             11.3 ns         11.3 ns     61261372
inverseCumulativeRng.next();                         4.80 ns         4.80 ns    145683037

The benchmarks can be found here: https://github.com/ralfkonrad/ql_performance_testing, so you might also compare them on your MAC(?).

lballabio commented 4 months ago

The default is slightly slower on my Mac as well:

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       3.81 ns         3.81 ns    185110260
xoshiro256StarStarBoxMullerGaussian.next();          7.54 ns         7.53 ns     92264298
mersenneTwisterBoxMullerGaussian.next();             11.8 ns         11.8 ns     59168597
inverseCumulativeRng.next();                         4.30 ns         4.29 ns    163303394
ralfkonrad commented 4 months ago

Interesting, how differently this pure number crunching behaves on different architectures and compilers...