Open oldoc63 opened 1 year ago
The marketing department at Live-it-LIVE reports that, during this time of year, about 10% of visitors to Live-it-LIVE.com make a purchase. The monthly report shows every visitor to the site and whether or not they made a purchase. The checkout page had a small bug this month, so the business department wants to know whether the purchase rate dipped below expectation. They've asked us to investigate this question.
In order to run a hypothesis test to address this, we'll first need to know two things from the data:
Assuming each row of our dataset represents a unique site visitor, we can calculate the number of people who visited the website by finding the number of rows in the data frame. We can then find the number who made a purchase by using a conditional statement to add up the total number of rows where a purchase was made.
For example, suppose that the dataset candy
contains a column named chocolate
with yes
recorded for every candy that has chocolate in it and no
otherwise. The following code calculates the sample size (the number of candies) and the number of those candies that contain chocolate:
## sample_size (number of rows):
samp_size = len(candy)
## number with chocolate:
total_with_chocolate = np.sum(candy.chocolate == 'yes')
The purchase column of monthly_report indicates whether or not a visitor made a purchase: a value of 'y' means they made a purchase; 'n' means they did not.
In the last exercise, we calculated that there were 500 site visitors to live-it-LIVE.com this month and 41 of them made a purchase. In comparison, if each of the 500 visitors had a 10% chance of making a purchase, we would expect around 50 of those visitors to buy something. If 41 different enough from 50 that we should question whether this month's site visitors really had a 10& chance of making a purchase?
To conceptualize why our expectation (50) and observation (41) might not be equal -even if was no dip in the purchase probability- let's turn to a common probability example: flipping a fair coin. We can simulate a coin flip in Python using the numpy.random.choice()
function:
flip = np.random.choice(['heads', 'tails'], size=1, p=[0.5,0.5])
print(flip)
## output is either ['heads'] or ['tails']
If we run this code (or flip a real coin) a few times, we'll find that -just like we can't know ahead of time whether any single visitor to Live-it-LIVE.com will make a purchase- we can't predict the outcome of any individual coin flip.
I we flip a fair coin 10 times in a row, we expect about 5 of those coins to come up heads (50%). We can simulate this in Python by changing the size parameter of numpy.random.choice():
flip = np.random.choice(['heads', 'tails'], size=10, p=[0.5, 0.5])
print(flip)
## output is something like: ['heads' 'heads' 'heads' 'tails' 'tails' 'heads' 'heads' 'tails' 'heads' 'heads']
If you try this yourself, it's perfectly reasonable that you'll get only four heads, or may be six or seven. Because this is a random process, we can't guarantee that exactly halt of our coin flips will come up heads. Similarly, even if each Live-it-LIVE visitor has a 10% chance of making a purchase, that doesn't mean we expect exactly 10% to do so in any given sample.
one_visitor
and print it. If the visitor made a purchase, the value of one_visitor should be ['y']; if they did not make a purchase, it should be ['n'].The first step in running a hypothesis test is to form a null hypothesis. For the question of whether the purchase rate al Live-it-LIVE.com was different from 10% this month, the null hypothesis describes a world in which the true probability of a visitor making a purchase was exactly 10%, but by random chance, we observed that only 41 visitors (8.2%) made a purchase.
Let's return to the coin flip example from the previous exercise. We can simulate 10 coin flips and print out the number of those flips that came up heads using the following code:
flips = np.random.choice(['heads', 'tails'], size=10, p=[0.5, 0.5])
num_heads = np.sum(flips == 'heads')
## output: 4
If we run this code a few times, we'll likely see different results each time. This will give us a sense for the range in the number of heads that could occur by random chance, even if the coin is fair. We're more likely to see numbers like four, five, or six, but maybe we'll see something more extreme every once in a while- ten heads in a row, or even zero.
We simulated a random sample of 500 visitors, where each visitor had a 10% chance of making a purchase. When we pressed 'Run' a few times, we saw that the number of purchases varied from sample to sample, but was around 50.
Similarly, we simulated a single random sample of 10 coin flips, where each flip had a 50% chance of coming up heads. We saw that the number of simulated heads was not necessarily 5, but somewhere around 5.
By running the same simulated experiment many times, we can get a sense for how much a particular outcome (like the number of purchases, or heads) varies by random chance. Consider the following code:
Note that 10000 is an arbitrarily chosen large number -it's big enough that it will yield almost all possible outcomes of our experiment, and small enough that the simulation still runs quickly. From inspecting the output, we can see that the number of 'heads' varied between 0 and 10:
Thus, if we flip a fair coin 10 times, we could observe anywhere between 0 and 10 heads by random chance.
In the previous exercise, we simulated 10000 different samples of 500 visitors, where each visitor had a 10% chance of making a purchase, and calculated the number of purchases per sample. Upon further inspection, we saw that those numbers ranged from around 25 to 75. This is useful information, but we can learn even more by inspecting the full distribution.
For example, recall our 10000 coin flip experiments: for each experiment, we flipped a fair coin 10 times and recorded the number of heads in a list named outcomes. We can plot a histogram of outcomes using matplotlib.pyplot.hist(). We can also add a vertical line at any x-value using matplotlib.pyplot.axvline:
This histogram shows us that, over 10000 experiments, we observed as few as 0 and as many as 10 heads out of ten flips. However, we were most likely to observe around 4-6 heads. It would be unlikely to observe only 2 heads (where the vertical red line is).
null_outcomes
contains numbers of purchases simulated under the null hypothesis. Add code to plot a histogram of null_outcomes and inspect the plot. What range of values occurs most frequently?41 purchases is somewhat likely to occur based on this null distribution. It's not in the middle of the distribution, where there's the most density, but it's also not way out in the tails (which would mean it is very unlikely).
So far, we've inspected the null distribution and calculated the minimum and maximum values. While the number of purchases in each simulated sample ranged roughly from 25 to 75 by random chance, upon further inspection of the distribution, we saw that those extreme values happened very rarely.
By reporting an interval covering 95% of the values instead of the full range, we can say something like:
We are 95% confident that, if each visitor has a 10% chance of making a purchase, a random sample of 500 visitors will make between 37 and 63 purchases.
We can use the np.percentile()
function to calculate this 95% interval as follows:
We calculated the 2.5th and 97.5th percentiles so that exactly 5% of the data falls outside those percentiles (2.5% above the 97.5th percentile, and 2.5% below the 2.5th percentile). This leaves us with a range covering 95% of the data.
If our observed statistic falls outside this interval, then we can conclude it is unlikely that the null hypothesis is true. In this example, because 41 falls within the 95% interval (37-64), it is still reasonable likely that we observed a lower purchase rate by random chance, even though the null hypothesis was true.
null_90CI
and print it out. Is the observed value of 41 purchases inside or outside this interval? P-value calculations and interpretations depend on the alternative hypothesis of a test, a description of the difference from expectation that we are interested in.
For example, let's return to the 10-coin-flip example. Suppose that we flipped a coin 10 times an observe only two heads. We might run a hypothesis test with the following null and alternative hypothesis:
This hypothesis test asks the question: if the probability of heads is 0.5, what's the probability of observing 2 or fewer heads among a single sample of 10 coin flips?
Earlier, we used a for-loop to repeatedly (10000 times) flip a fair coin 10 times, and store the number of heads (for each set of 10 flips) in a list named outcomes
. The probability of observing 2 or fewer heads among 10 coin flips is approximately equal to the proportion of those 10000 experiments where we observed 0, 1, or 2 heads:
This calculation is equivalent to calculating the proportion of this histogram that is colored in red:
https://content.codecademy.com/courses/Hypothesis_Testing/one_sided_coin_flip.svg
We estimated that the probability of observing 2 or fewer heads is about 0.059 (5.9%). This probability (0.059) is referred to as one-sided p-value.
In other words, calculate the proportion of values in null_outcomes that are less than or equal to 41(the observed number of purchases that we calculated earlier). Save this number as a variable named p_value and print it out.
Try pressing 'Run' a few times; you should see slightly different values of p_value each time.
In the previous exercise, we calculated a one-sided p-value. In this exercise, we’ll estimate a p-value for a 2-sided test, which is the default setting for many functions in Python (and other languages, like R!).
In our 10-coin-flip experiment, remember that we observed 2 heads, which is less than the expected value of 5(50% of 10) if the null hypothesis is true. The two sided test focuses on the number of heads being three different from expectation, rather than just less than. The hypothesis test now ask the following question:
Suppose that the true probability of heads is 50%. What is the probability of observing either two or fewer heads or eight or more heads? (Note that two and eight are both three away from five). The calculation now estimates the proportion of the null histogram that is colored in red:
https://content.codecademy.com/courses/Hypothesis_Testing/two_sided_coin_flip.svg
This proportion can be calculated in Python as follows. Note that the |
symbol is similar to 'or', but works for comparing multiple values at once.
We end up with a p-value that is twice as large as the one-sided p-value.
So far, we've conducted a simulated binomial hypothesis test for Live-it-LIVE.com. In this exercise, we'll use our code from the previous exercises to write our own binomial test function. Our function will use simulation, so it will estimate (albeit fairly accurately) the same p-values we would get using much more complex mathematical equations.
A function has been outlined for you in script.py which contains the code that we used for Live_it_Live inside a function named simulation_binomial_test()
. Your goal in the next few exercises will be to edit this function so that it takes in any values for the following:
The function should return a p-value for a one-sided test where the alternative hypothesis is that the true probability of success is less than the null.
observed_successes
(the observed sample statistic, eg., 41 purchases)n
(the sample size, eg., 500 visitors)p
(the null probability of success, eg., 0.10)simulation_binomial_test()
function to remove all of the hard-coded values (eg., 500, 0.1, 0.9, and 41) so that the proper parameters are used in each calculation.binom_test()
function. Do you get similar answers?You've written your own function to simulate a binomial test.
More formally, the binomial distribution describes the number of expected "successes" in an experiment with some number of "trials". In the example you just walked through, the experiment consisted of 500 people visiting Live-it-LIVE.com. For each of those trials (visitors), we expected a 10% chance of a purchase (success), but observed only 41 successes (less than 10%).
SciPy has a function called binom_test()
, which performs a binomial test for you. The default alternative hypothesis for the binom_test() function is two-sided, but this can be changed using the alternative parameter (eg., alternative = 'less' will run a one-sided lower tail test).
binom_test()
requires three inputs, the number of observed successes, the number of total trials, and the expected probability of success. For example, with 10 flips of a fair coin (trials), the expected probability of heads is 0.5. Let's imagine we get two heads (observed successes) in 10 flips. Is the coin weighted? The function call for this binomial test would look like:
This tell us that if the true probability of heads is 0.5, the probability of observing 2 or fewer heads or 8 or more heads is 0.109 (10.9%).
Save the p-value as p_value_2sided and print it out. Is this p-value similar to what you calculated via simulation (approximately 0.2)?
Save the p-value as p_value_1sided and print it out. Is this p-value similar to what you calculated via simulation (approximately 0.1)?
Introduction
We will walk through a simulation of a binomial hypothesis test in Python. Binomial tests are useful for comparing the frequency of some outcome in a sample to the expected probability of that outcome. For example, if we expect 90% of ticketed passengers to show up for their flight but only 80 of 100 ticketed passengers actually show up, we could use a binomial test to understand whether 80 is significantly different from 90.
Binomial test are similar to one-sample t-test in that they test a sample statistic against some population-level expectation. The difference is that:
In Python, as in many other programming languages used for statistical computing, there are a number of libraries and functions that allow a data scientist to run a hypothesis test in a single of code. However, a data scientist will be much more likely to spot and fix potential errors and interpret results correctly if the have a conceptual understanding of how these functions work.