Open oldoc63 opened 1 year ago
prices
representing the purchase prices of customers to BuyPie.com in the past day. First, print out prices
to the console and examine the numbers. How much variation is there in the purchase prices? Can you estimate the mean by looking at these numbers?In the last exercise, we inspected a sample of 50 purchase prices at BuyPie and saw that the average was 980 rupees. Suppose that we want to run a one-sample t-test with the following null and alternative hypotheses:
Scipy has a function called ttest_1samp(), which performs a one-sample t-test for you. ttest_1samp() requires two inputs, a sample distribution (eg. the list of the 50 observed purchase prices) and a mean to test against (eg. 1000):
tstat, pval = ttest_1samp(sample_distribution, expected_mean)
The function uses your sample distribution to determine the sample size and estimate the amount of variation in the population -which are used to estimate the null distribution. It returns two outputs: the t-statistic and the p-value.
null
the average price is 1000 rupees; alternative
the average price is not 1000 rupees.P-values are probabilities, so they should be between 0 and 1. This p-value is the probability of observing an average purchase price less than 980 or more than 1020 among a sample of 50 purchases. If you run the test correctly, you should see a p-value of 0.49 or 49%.
Given that the mean purchase price in this sample was 980, which is not very far from 1000, we probably expect this p-value to be relatively large. The only reason it COULD be small (eg., <.05) is if purchase prices had very little variation (eg., they were all within a few Rupees of 980). We can see from the data print out that this is not the case. Therefore, a p-value around 0.49 makes sense!
When running any hypothesis test, it is important to know and verify the assumptions of the test. The assumptions of a one-sample-t-test are as follows:
The sample was randomly selected from the population
For example, if you only collect data for site visitors who agree to share their personal information, this subset of visitors was not randomly selected and may differ from the larger population.
The individual observations were independent
For example, if one visitor to BuyPie loves the apple pie they bought so much that they convince their friend to buy one too, those observations were not independent.
The data is normally distributed without outliers or the sample size is large (enough)
There are no set rules on what a 'large enough' sample size is , but a common threshold is around 40. For sample sizes smaller than 40, and really all samples in general, it's a good idea to make sure to plot a histogram of your data and check for outliers, multi-modal distributions (with multiple humps), or skewed distributions. If you see any of those things for a small sample, a t-test is probably not appropriate.
In general, if you run an experiment that violates (or possibly violates) one of these assumptions, you can still run the test and report the results -but you should also report assumptions that were not met and acknowledge that the test results could be flawed.
You know how to implement a one-sample t-test in Python and verify the assumptions of the test.
As a final exercise, some data has been loaded for you with purchase prices for consecutive days at BuyPie. You can access the first day using daily_prices[0], the second day using daily_prices[1], etc.. To practice running a one-sample t-test and inspecting the resulting p-value, try the following:
Introduction
In this lesson, we'll walk through the implementation of a one-sample t-test in Python. One-sample t-test are used for comparing a sample average to a hypothetical population average. For example, a one-sample t-test might be used to address questions such as:
As an example, let's imagine the fictional business BuyPie, which sends ingredients for pies to your household so you can make them from scratch. Suppose that a product manager wants online BuyPie orders to cost around 1000 Rupees on average. In the past day, 50 people made an online purchase and the average payment per order was less than 1000 Rupees. Are people really spending less than 1000 Rupees on average? Or is this the result of chance and a small sample size?