typelevel / spire

Powerful new number types and numeric abstractions for Scala.
http://typelevel.org/spire/
MIT License
1.76k stars 242 forks source link

Add more random distributions to spire.random #214

Open non opened 10 years ago

non commented 10 years ago

We have support for Uniform and Gaussian currently.

It would be nice to get support for at least: binomial, poisson, gamma, exponential. This might be a rabbit hole but being able to specify which distribution you want, rather than just getting a Random[A] instance, seems worth it.

dusank commented 10 years ago

Hi @non,

I am currently looking into this. I would like to provide faster Normal(Gaussian) and Exponential generation through Marsaglias more efficient Ziggurat method: http://www.jstatsoft.org/v05/i08/paper

I wonder where to best hook into as there seems to be both the Generator.nextGaussian method as well as the MarsagliaGaussian class implementing Marsaglias older algorithm independently/redundantly.

Also wondering where to best hook up Exponential generation.

Perhaps the current spire.random package needs some DRY cleanup first?

non commented 10 years ago

Hi @dusank.

So, I have a refactor of the whole package that improves the design a bit. For example it removes immutable.Generator, and moves things back from mutable into random. It also introduces some new types and methods. I'd like to get that in before we do too much other rearranging.

The longterm design is intended to be:

  1. Generator is a small, mutable core of methods for producing random numbers.
  2. Dist[A] is a way of producing arbitrary A values using a Generator (not necessarily any particular distribution).
  3. Uniform[A], Gaussian[A], Exponential[A], Binomial[A], and so on, would be ways of producing A values according to their particular distribution.
  4. Random[A] a monadic interface to producing A values which is also immutable

I think Generator#nextGaussian is a bit of an artifact, and should be removed. But if you notice, the other implementation is only 50% efficient (it drops one of the two values generated). One challenge with the design is that I'd like for everything except Generator to be immutable. But many of these methods generate multiple values, so there's a basic design question of where to put things.

Anyway, for now I think it would be best to work on adding new things, rather than changing existing interfaces/classes (to avoid creating more major conflicts).

Does this make sense? Do you have alternate proposals for how things should look?

dusank commented 10 years ago

Hi @non,

Is this the refactor you are reffering to? topic/random-take3

Marsaglias Ziggurat method can be implemented for much more efficient sampling from decreasing densities (i.e. Gaussian, Exponential) and AFAIK it does not generate two values like Marsaglias polar method that is currently being used. Thus it would, at least partially, solve the issues you have mentioned.

Can you estimate how long it will take for your refactor to reach main?

Currently I do not have any proposals yet, I'll need to wrap my head around the current implementation and hopefully your refactor too first.

non commented 10 years ago

Yes that's right. With no other changes going into spire.random I expect I could land the refactor soon, probably next week (or possibly earlier).

If you have free time and are interested, you could do the merge yourself (there will definitely be conflicts due to various refactoring and new features added) and submit a PR. But I have been expecting to have to do it myself so you shouldn't feel any pressure.

In the meantime, feel free to ask implementation questions or brainstorm on this bug.

non commented 10 years ago

So I may have been overly pessimistic: https://github.com/non/spire/pull/262

non commented 10 years ago

OK great. So with that out of the way, I'd say you should feel free to add new distributions and think about how we might restructure existing code.

dusank commented 10 years ago

Please check this out: https://github.com/non/spire/pull/266