haskell-numerics / random-fu

A suite of Haskell libraries for representing, manipulating, and sampling random variables
42 stars 21 forks source link

sample with probability? #16

Open iffsid opened 10 years ago

iffsid commented 10 years ago

What would be the best/least-disruptive way to enable a sample' function that can sample from a distribution and additionally return the probability of the sample it returns from the distribution? I.e, sample' returns a pair/tuple of sampled value and its associated probability.

I was initially of the mind to do something like add an additional method to the Sampleable class called probOf :: d t -> t -> m Double; but I'm not sure this is the best way to go about getting what I want.

I think the ability to inspect the probability of samples is a big plus when dealing with something like probabilistic programming or graphical models where nodes become samplers and inference happens based on both the sampled values and their associated probabilities (eg. metropolis-hastings).

mokus0 commented 10 years ago

Sorry to take so long to get back on this. Life has been busy for quite a long time now.

An important complication to consider is that "probability" has two substantially different meanings depending on the kind of distribution (and, potentially, both can apply to the same distribution). It can mean the "probability mass", the probability of the specific event which, for a continuous distribution, is always zero. It can also refer to "probability density", which is more of an averaged property which diverges for discrete distributions. The two can be unified if you change perspective a bit and look at probability measures (essentially, only ever talk about probabilities of certain sets of events) but then you have to come up with ways to describe sets of events in a way that makes sense for event spaces with arbitrary dimensionality (or no meaningful concept of dimension at all).

All that to say, something like that would probably need to be a separate class or family of classes in order to avoid placing undue constraints on things that can implement Sampleable (which is really only more of a syntactic convenience anyway). The least-disruptive option would probably be to treat it similarly to the CDF class. Probably the simplest/most familiar to most users approach would be to introduce PMF and PDF classes for probability mass and probability density, respectively.

If what you're after is "probability masses", you may also find the Categorical distribution already does what you want - it implements Monad, allowing building up Categoricals from simpler ones (just like RVar) and you can then use toList (and the list monad if desired) to get the individual event probabilities. It may also be useful to add a "DiscreteDistribution" or similar class that allows converting a parametric distribution to a Categorical.

idontgetoutmuch commented 10 years ago

I have already created a pull request with a PDF class with pdf and logPdf methods: https://github.com/mokus0/random-fu/pull/24. Here's an example: https://github.com/mokus0/random-fu/pull/24. Let me know how you get on.