Open oldoc63 opened 1 year ago
Next, we've simulate a sampling distribution of the mean (using a sample size of 100) and superimpose a normal distribution on top of it. Note how the stimated sampling distribution follows the normal curve almost perfectly.
Note that the CLT only applies to the sampling distribution of the mean and not other statistics like maximum, minimum and variance.
Now that we have seen the skewed population distribution, let's simulate a sampling distribution of the mean. According to the CLT, we will see a normal distribution once the sampling size is large enough. To start, we have set the sample size to 6.
Wiht such a small sample size, the sampling distribution looks slightly skewed. This is because the population was not normally distributed and we have a small sample size.
The CLT not only establishes that the sampling distribution will be normally distributed, but also allow us to describe that normal distribution quantitatively. Normal distributions are described by their mean mu (μ) and standard deviation sigma (σ).
$$ Sampling Distribution St. Dev = \frac{σ}{\sqrt{n}} $$
As an example of this, let's look again at our salmon fish population. Last exercise, we saw that the sampling distribution of the mean was normally distributed. In the plot below, we can see that the mean of the simulated sampling distribution is approximately equal to the population mean.
https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/pop_mean.svg
Set variable samp_size equal to 6 and run the code.
Because the original population is normally distributed, the CLT applies even with a smaller sample size.
So far, we've defined the term sampling distribution and shown how we can simulate an approximated sampling distribution for a few different statistics (mean, maximum, variance, etc.). The
Central Limit Theorem
(CLT) allow us to specifically describe the sampling distribution of the mean.The CLT states that the sampling distribution of the mean is normally distributed as long as the population is not too skewed or the sample size is large enough. Using a sample size of n > 30 is usually a good rule of thumb, regardless of what the distribution of the population is like. If the distribution of the population is normal, the sample size can be smaller than that.
Let's take anothe look at the salmon weight to see how CLT applies here. The first plot below shows the population distribution. The salmon weight is skewed right, meaning the tail of the distribution is longer on the right than on the left.
https://static-assets.codecademy.com/skillpaths/master-stats-ii/sampling-distributions/pop_distribution.svg