shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Vectorized RNG #4430

Open saatvikshah opened 5 years ago

saatvikshah commented 5 years ago

As discussed with @karlnapf in #4424, there are several improvements/additions that could be made to the random number generation(RNG) API for SGMatrix/SGVector. Neither take advantage of the CRandom::fill_array vectorized RNGs. SGMatrix does not support RNG while SGVector simply loops over the entire legnth calling CMath::random per index. In summary the proposal is to add the following for both SGMatrix/SGVector:

  1. SG*::random(): Generates random numbers in the [0, 1] range for float types. Generates numbers in [std::numeric_limits::min,std::numeric_limits::max] for int types. Uses the fill_array vectorized RNG wherever possible and falls back to simple for loop otherwise. These will follow a uniform distribution.
  2. SG*::random_normal(mu, sigma): Uses SG*::random() followed by a possibly vectorized variant of the Box-Muller transform.
  3. To be discussed: Since SGVector::random(lb, ub) is publicly exposed already I dont think it can be removed. A SG::random(lb, ub) could be additionally supplied which utilizes SG::random() to generate random numbers more efficiently.

Finally, I also think it might be important to benchmark a simple for-loop vs. vectorized variant for these cases.

karlnapf commented 5 years ago

Really nice write-up!

One comment, 3. we can definitely change the api here ... towards a more standard random and random_normal, or even better: rand(), and randn()

saatvikshah commented 5 years ago

I'd like to work on this as my next issue. One thing I realized while looking at the Random.h is that [0, 1] is not available within dSFMT used for vectorized double generation. So would have to do [0, 1) instead.

karlnapf commented 5 years ago

That should be fine imo as the boundaries are a set of zero measure i.e. P(x=1)=0 almost surely