oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Sampling from a Population #433

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

In statistics, we often want to learn about a large population. Since collecting data for an entire population is often impossible, researchers may use a smaller sample of data to try to answer their questions.

To do this, a researcher might calculate a statistic such as mean or median for a sample of data. Then they can use that statistic as an estimate for the population value the really care about.

For example, suppose that a researcher wants to know the average weight of all Atlantic Salmon fish. It would be impossible to catch every single fish. Instead, the researchers might collect a sample of 50 fish off the coast of Nova Scotia and determine that the average weigh of those fish is x. If the same researchers collected 50 new fish and took the new average weight, that average would likely be slightly different that the first sample average.

We will go over how we can extrapolate from sample data in order to describe our uncertainty about the statistics of the full population.

oldoc63 commented 1 year ago

Random Sampling in Python

Now that we've generate some random samples from a population using an applet, let's code this ourselves in Python. The numpy.random package has several functions that we could use to simulate random sampling. In this exercise, we'll use the function np.random.choice(), which generates a sample of some size from a given array.

We'll pretend that we actually have a list of all the weights of Atlantic Salmon that currently exist.

In the example code we have done the following:

oldoc63 commented 1 year ago
oldoc63 commented 1 year ago

As we saw in the last exercise, smaller sample sizes will have sample means that vary more from each other each time you take a random sample. With a small sample, extreme values can significantly impact the sample mean, causing it to vary from one sample to the next.