oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

One-Sample T-Test in SciPy #456

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

Introduction

In this lesson, we'll walk through the implementation of a one-sample t-test in Python. One-sample t-test are used for comparing a sample average to a hypothetical population average. For example, a one-sample t-test might be used to address questions such as:

As an example, let's imagine the fictional business BuyPie, which sends ingredients for pies to your household so you can make them from scratch. Suppose that a product manager wants online BuyPie orders to cost around 1000 Rupees on average. In the past day, 50 people made an online purchase and the average payment per order was less than 1000 Rupees. Are people really spending less than 1000 Rupees on average? Or is this the result of chance and a small sample size?

oldoc63 commented 1 year ago
  1. We have provided a small data set called prices representing the purchase prices of customers to BuyPie.com in the past day. First, print out prices to the console and examine the numbers. How much variation is there in the purchase prices? Can you estimate the mean by looking at these numbers?
oldoc63 commented 1 year ago
  1. Calculate the mean of prices using np.mean(). Store it in a variable called prices_mean and print it out.
oldoc63 commented 1 year ago

Implementing a One-Sample T-Test

In the last exercise, we inspected a sample of 50 purchase prices at BuyPie and saw that the average was 980 rupees. Suppose that we want to run a one-sample t-test with the following null and alternative hypotheses:

Scipy has a function called ttest_1samp(), which performs a one-sample t-test for you. ttest_1samp() requires two inputs, a sample distribution (eg. the list of the 50 observed purchase prices) and a mean to test against (eg. 1000):

tstat, pval = ttest_1samp(sample_distribution, expected_mean)

The function uses your sample distribution to determine the sample size and estimate the amount of variation in the population -which are used to estimate the null distribution. It returns two outputs: the t-statistic and the p-value.

oldoc63 commented 1 year ago
  1. Use ttest_1samp() to run the hypothesis test: null the average price is 1000 rupees; alternative the average price is not 1000 rupees.
oldoc63 commented 1 year ago
  1. Print out pval to the console
oldoc63 commented 1 year ago

P-values are probabilities, so they should be between 0 and 1. This p-value is the probability of observing an average purchase price less than 980 or more than 1020 among a sample of 50 purchases. If you run the test correctly, you should see a p-value of 0.49 or 49%.

Given that the mean purchase price in this sample was 980, which is not very far from 1000, we probably expect this p-value to be relatively large. The only reason it COULD be small (eg., <.05) is if purchase prices had very little variation (eg., they were all within a few Rupees of 980). We can see from the data print out that this is not the case. Therefore, a p-value around 0.49 makes sense!

oldoc63 commented 1 year ago

Assumptions of a One Sample T-Test

When running any hypothesis test, it is important to know and verify the assumptions of the test. The assumptions of a one-sample-t-test are as follows:

In general, if you run an experiment that violates (or possibly violates) one of these assumptions, you can still run the test and report the results -but you should also report assumptions that were not met and acknowledge that the test results could be flawed.

oldoc63 commented 1 year ago
  1. Using plt.hist(), plot a histogram of prices and check whether the values are (approcimately) normally distributed.
oldoc63 commented 1 year ago

Review

You know how to implement a one-sample t-test in Python and verify the assumptions of the test.

oldoc63 commented 1 year ago

As a final exercise, some data has been loaded for you with purchase prices for consecutive days at BuyPie. You can access the first day using daily_prices[0], the second day using daily_prices[1], etc.. To practice running a one-sample t-test and inspecting the resulting p-value, try the following: