Central Limit Theorem - Githubissues

oldoc63 commented 1 year ago

So far, we've defined the term sampling distribution and shown how we can simulate an approximated sampling distribution for a few different statistics (mean, maximum, variance, etc.). The Central Limit Theorem (CLT) allow us to specifically describe the sampling distribution of the mean.

The CLT states that the sampling distribution of the mean is normally distributed as long as the population is not too skewed or the sample size is large enough. Using a sample size of n > 30 is usually a good rule of thumb, regardless of what the distribution of the population is like. If the distribution of the population is normal, the sample size can be smaller than that.

Let's take anothe look at the salmon weight to see how CLT applies here. The first plot below shows the population distribution. The salmon weight is skewed right, meaning the tail of the distribution is longer on the right than on the left.

https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/pop_distribution.svg

oldoc63 commented 1 year ago

Next, we've simulate a sampling distribution of the mean (using a sample size of 100) and superimpose a normal distribution on top of it. Note how the stimated sampling distribution follows the normal curve almost perfectly.

https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/normal_samp_distribution.svg

oldoc63 commented 1 year ago

Note that the CLT only applies to the sampling distribution of the mean and not other statistics like maximum, minimum and variance.

oldoc63 commented 1 year ago

In order to see the Central Limit Theorem in action, let's look at another population of fish that is not normally distributed.

oldoc63 commented 1 year ago

Now that we have seen the skewed population distribution, let's simulate a sampling distribution of the mean. According to the CLT, we will see a normal distribution once the sampling size is large enough. To start, we have set the sample size to 6.

Wiht such a small sample size, the sampling distribution looks slightly skewed. This is because the population was not normally distributed and we have a small sample size.

oldoc63 commented 1 year ago

Now change the sample size to 60 and run the code. Now that we have increased the sample size, the sampling distribution should look more normal.

oldoc63 commented 1 year ago

The CLT not only establishes that the sampling distribution will be normally distributed, but also allow us to describe that normal distribution quantitatively. Normal distributions are described by their mean mu (μ) and standard deviation sigma (σ).

We take samples of size n from a population (that has a true population mean and standard deviation) and calculate the sample mean x.
Given that n is sufficiently large (n>30), the sampling distribution of the means will be normally distributed with:
- mean x aproximately equal to the population mean μ
- standard deviation equal to the population standard deviation divided by the square root of the sample size. We can write this out as:

$$ Sampling Distribution St. Dev = \frac{σ}{\sqrt{n}} $$

As an example of this, let's look again at our salmon fish population. Last exercise, we saw that the sampling distribution of the mean was normally distributed. In the plot below, we can see that the mean of the simulated sampling distribution is approximately equal to the population mean.

https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/pop_mean.svg

https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/mean_sampling_dist.svg

oldoc63 commented 1 year ago

We've set up a simulation of a population that has a mean of 10 and a standard deviation of 10. We've set a sample size of 50. According to the CLT, we should have a sampling distribution of the mean that is normally distributed and has a mean that is close to the population mean.

oldoc63 commented 1 year ago

Set variable samp_size equal to 6 and run the code.

Because the original population is normally distributed, the CLT applies even with a smaller sample size.

oldoc63 / learningDS

Central Limit Theorem #435