oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Cumulative Distribution Function #425

Open oldoc63 opened 2 years ago

oldoc63 commented 2 years ago

The cumulative distribution function for a discrete random variable can be derived form the probability mass function. However, instead of the probability of observing a specific value, the cumulative distribution function gives the probability of observing a specific value or less.

The probabilities for all possible values in a given probability distribution add up to 1. The value of a cumulative distribution function at a given value is equal to the sum of the probabilities lower than it, with the value of 1 for the largest possible number.

Cumulative distribution functions are constantly increasing, so for two different numbers that the random variable could take on, the value of the function will always be greater for the larger number. Mathematically, this is represented as:

$$ If x1 < x2, -> CDF(x1) < CDF(x2) $$

We showed how the probability mass function can be used to calculate the probability of observing less than 3 heads out of 10 coin flips by adding up the probabilities of observing 0, 1 and 2 heads. The cumulative distribution function produces the same answer by evaluating the function at CDF (X=2). In this case, using CDF is simpler than the PMF because it requires one calculation rather than three.

oldoc63 commented 2 years ago

Image

oldoc63 commented 2 years ago

The animation to the right shows the relationship between the probability mass function and the cumulative distribution function. The top plot is the PMF, while the bottom plot is the corresponding CDF. When looking at the graph of a CDF, each y-axis value is the sum of the probabilities less than or equal to it in the PMF.

oldoc63 commented 2 years ago

We can use a cumulative distribution function to calculate the probability of a specific range by taking the difference between two values from the cumulative distribution function. For example, to find the probability of observing between 3 and 6 heads, we can take the probability of observing 6 or fewer heads and subtracting the probability of observing 2 or fewer heads. This leaves a remnant of between 3 and 6 heads.

Image

oldoc63 commented 2 years ago

It is important to note that to include the lower bound in the range, the value being subtracted should be one less than the lower bound. In this example, we wanted to know the probability from 3 to 6, which includes 3. Mathematically, this looks like the following equation:

$$ P(3<=X<=6) = P(X<=6) - P(P<3) $$

$$ P(3<=X<=6) = P(X<=6) - P(X<=2) $$

oldoc63 commented 2 years ago

Using the Cumulative Distribution Function in Python

We can use the binom.cdf() method from the scipy.stats library to calculate the cumulative distribution function. This method takes 3 values:

Calculating the probability of observing 6 or fewer heads from 10 fair coin flips (0 to 6 heads) mathematically looks like the following:

$$ P(6 or fewer heads) = P(0 to 6 heads) $$

oldoc63 commented 2 years ago

Calculating the probability of observing between 4 and 8 heads from a 10 fair coin flips can be thought of as taking the difference of the value of the cumulative distribution function at 8 from the cumulative from the cumulative distribution function at 3:

$$ P(4 to 8 Heads) = P(0 to 8 Heads) - P(0 to 3 Heads) $$

oldoc63 commented 2 years ago

To calculate the probability of observing more than 6 heads from 10 fair coin flips we subtract the value of the cumulative distribution function at 6 from 1. Mathematically, this looks like the following:

$$ P(more than 6 heads) = 1 - P(6 or fewer heads) $$

Note that "more than 6 heads" does not include 6. In python, we would calculate this probability using the following code:

oldoc63 commented 2 years ago

Uncomment and assign the variable prob_1 to the probability of observing 3 or fewer heads from 10 fair coin flips using the cumulative distribution function. Then print prob_1. Use the binom.cdf() from the scipy.stats library.

The calculation using the CDF is simpler, because the code using binom.pmf() is much less efficient.

oldoc63 commented 2 years ago

Uncomment prob_2 and assign the variable to be the probability of observing more than five heads from 10 fair coin flips. Then print prob_2. Use the binom.cdf() method from the scipy.stats library.

oldoc63 commented 2 years ago

Assign the object prob_3 the probability of observing between 2 and 5 heads from a 10 fair coin flips. Then print prob_3. Run the code for the probability mass function and compare.