oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Biased Estimators #449

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

According to the Central Limit Theorem, the mean of the sampling distribution of the mean is equal to the population mean. This is the case for some, but not all, sampling distributions. Remember, you can have a sampling distribution for any sample statistic, including:

Because the mean of the sampling distribution of the mean is equal to the mean of the population, we call it an unbiased estimator. A statistic is called an unbiased estimator of a population parameter if the mean of the sampling distribution of the statistic is equal to the value of the statistic for the population.

The maximun is one example of a biased estimator, meaning that the mean of the sampling distribution of the maximum is not centered at the population maximum.

oldoc63 commented 1 year ago
  1. In the workspace, you can see the sampling distribution of the maximum. The mean of the distribution is not equal to the maximum of the population, showing that is a biased estimator.

    Let look at anothe example. Edit the function app_statistic() so that it returns the variance using the NumPy function np.var(). Change the string as well to update the title of your plots.

oldoc63 commented 1 year ago

Since the mean of the sampling distribution of the variance is not equal to the variance of the population, it is a biased estimator. However, you can notice that it is close! If we set ddof=1 in the np.var() function, we can calculate sample variance, which is very similar to "population variance" except that the formula has sample_size - 1 in the denominator, instead of just sample size.