jliszka / probability-monad

Apache License 2.0
263 stars 38 forks source link

what if your data is given in the form of conditional probabilities? #21

Closed eric-bzb closed 1 year ago

eric-bzb commented 2 years ago

I have really enjoyed playing with your probability-monad in scala. Thank you for coding it up! One issue I keep running into is a situation where the data that I am starting from is given in the form of conditional probabilities.

Is a conditional probability distribution Pr(B | A) essentially just represented as a function Distribution[A] => Distribution[B]?

jliszka commented 2 years ago

Hi, thanks! So glad to hear it!

I think maybe it would be a joint probability distribution, Distribution[(A, B)] or maybe even something like A => Distribution[B]. Maybe an example would help though?

eric-bzb commented 2 years ago

Sorry for the delay in reply. I think I may not be explaining what I am after properly. I will try again.

Let's say we have two events, event A and event B but we do not know if the two events are independent. However, what we do know is the following data: Pr(A), Pr(B), Pr(A | B) (which are each just Double values between 0 and 1), and what we want to find is the joint probability Pr(A & B).

By a simple application of the definition of conditional probability, we can solve this by: Pr(A & B) = P(A | B)*Pr(B)

Yet, for some reason I am struggling mapping the above algorithm over to your probability-monad. Perhaps it is that the primitive data structure in your library is Distribution[T] and an "event" such as event A is really something like:

val probabilityOfEventA: Double = discreteUniform(List(1,2,3,4,5)).pr(_ < 5)

probabilityOfEventA above has type Double but perhaps the correct thing to do in order to implement the above (trivial) algorithm generically is to recast it into a Distribution[Boolean]

val eventA: Distribution[Boolean] = biasedCoin(probabilityOfEventA)

Sorry if this is not making much sense.

jliszka commented 2 years ago

OK, I think I see what you mean. Fundamentally, Pr(A | B) is a function distAgivenB: B => Distribution[A], in other words, something that given a particular outcome from B, knows the distribution of As that result from that choice from B. Then to get a joint distribution, all you would need is a distB: Distribution[B], and then do:

distB.flatMap(b => distAgivenB(b).map(a => (a, b))

or equivalently (and much more readable IMHO)

for {
  b <- distB
  a <- distAgivenB(b)
} yield (a, b)

This maps to your example insofar as Pr(A) is not needed.

More concretely:

val isRaining: Distribution[Boolean] = tf(0.2)
def isTraffic(isRaining: Boolean): Distribution[Boolean] = if (isRaining) tf(0.8) else tf(0.3)
val joint = for {
  r <- isRaining
  t <- isTraffic(r)
} yield (r, t)

scala> joint.hist
(false,false) 56.05% ████████████████████████████████████████████████████████
 (false,true) 23.73% ███████████████████████▋
 (true,false)  4.00% ████
  (true,true) 16.22% ████████████████▏

The probability monad is kind of an inside-out version of the way you're probably used to dealing with probability. You're only really dealing with individual samples (that is, the variable a is a particular choice from the set of possible events A). For me it's a more concrete way of thinking about it, but it definitely takes getting used to.

eric-bzb commented 2 years ago

Thanks! Yes, I think we are talking about exactly the same thing now. I even had written some code that is very similar to your own:

    /**
      * Calculate the joint distribution, taking conditional probability
      * as primitive (useful if it is not yet known whether the events are independent)
      *
      * @param distB
      * @param aGivenB
      * @return
      */
    def joint[B](distB: Distribution[B])(aGivenB: B => Distribution[A]): Distribution[(A,B)] = for {
      b <- distB
      a <- aGivenB(b)
    } yield (a,b)

I guess I was simply not believing my own code! :-) Again, thanks for providing the helpful hints and example.

jliszka commented 2 years ago

Nice! Glad to help, any time.

mcanlas commented 1 year ago

This issue can be closed, yes?