To generate a random sample of data, we use the function "sample ()".
Ex: If we want to generate a sample of 6 numbers from the integers from 1 to 20,
sample(x = 1:20, size = 6)
Note: this syntax assumes that there is no replacement (it won't pick one number twice).
If you want to sample with replacement to avoid this use the following:
sample (x = 1:20, size=6)## [1] 15 6 7 11 2 17
1.2 Pictorial and Tabular Methods in Descriptive Statistics
Stem-and-leaf display:
Ex: The following are the prices (in $) in a jewelry store:
379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799
Draw a stem an leaf plot of this data:
Create a new variable (we'll cal it "prices"). Note: we always use the syntax <- c() to generate new variables.prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)
Draw a stem and leaf plot
stem(prices)
If you want to create a frequency histogram of this data you use the following:
prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)hist(prices)Note: histograms are drawn with unbinned data. R does the binning in the process of drawing the histogram. This means that the program chooses the size of the bins for you
To add specific bin sizes and colors to your histogram, you can use the following syntax:
hist(prices, breaks = c(300, 400, 500, 600, 700, 800), col = "lightblue")
We use an almost identical syntax to generate density histograms, but add a condition (freq=FALSE):
prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)hist(prices, freq = FALSE, breaks = c(300, 400, 500, 600, 700, 800), col = "lightblue", las = 1)
1.3 Measures of location
Consider the same dataset (prices). To find the mean, median, quartiles, and trimmed mean, we use the following syntax:
same as website
1.4 Measures of variability
If you want to generate a horizontal boxplot, use the following (Note: you can also add a color as done below):
boxplot(prices, horizontal = TRUE, col='pink')
Now, to create a comparative boxplot of two datasets, we will create a "dataframe" (2d array with two dataset values). Then, we will create a boxplot from this "joined" dataset.
PTSD <- c(10, 20, 25, 28, 31, 35, 37, 38, 38, 39, 39, 42, 46)Healthy <- c(23, 39, 40, 41, 43, 47, 51, 58, 63, 66, 67, 69, 72)df <- data.frame(Healthy, PTSD)boxplot(df, horizontal = TRUE)
Syntax explanation
head(): directly see how the dataset looks; useful when the dataset is large and it’s difficult to display all rows and columns together.
fivenum(): returns the minimum value, lower fourth, median, upper fourth, and maximum value
boxplot(): visualizes the five number summary plus outliers. (It’s clear that the ToothGrowth data is not skewed.)
stem(): compares the number of data points that fall in different bins. (Here we can see that most values are between 20 and 29.)
hist(): draws a histogram – values are grouped in bins
cumsum(): takes a vector and returns the cumulative sums
1.1 Populations, Samples, and Processes
To generate a random sample of data, we use the function "sample ()". Ex: If we want to generate a sample of 6 numbers from the integers from 1 to 20,
sample(x = 1:20, size = 6)
Note: this syntax assumes that there is no replacement (it won't pick one number twice). If you want to sample with replacement to avoid this use the following:
sample (x = 1:20, size=6)
## [1] 15 6 7 11 2 17
sample (x = 1:20, size=8, replace=TRUE)
## [1] 15 8 14 1 2 18 15 1
1.2 Pictorial and Tabular Methods in Descriptive Statistics
Stem-and-leaf display: Ex: The following are the prices (in $) in a jewelry store: 379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799 Draw a stem an leaf plot of this data:
prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)
stem(prices)
If you want to create a frequency histogram of this data you use the following:
prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)
hist(prices)
Note: histograms are drawn with unbinned data. R does the binning in the process of drawing the histogram. This means that the program chooses the size of the bins for youTo add specific bin sizes and colors to your histogram, you can use the following syntax:
hist(prices, breaks = c(300, 400, 500, 600, 700, 800), col = "lightblue")
We use an almost identical syntax to generate density histograms, but add a condition (freq=FALSE):
prices <- c(379, 425, 450, 450, 499, 529, 535, 535, 545, 599, 665, 675, 699, 699, 725, 725, 745, 799)
hist(prices, freq = FALSE, breaks = c(300, 400, 500, 600, 700, 800), col = "lightblue", las = 1)
1.3 Measures of location
Consider the same dataset (prices). To find the mean, median, quartiles, and trimmed mean, we use the following syntax: same as website
1.4 Measures of variability
If you want to generate a horizontal boxplot, use the following (Note: you can also add a color as done below): boxplot(prices, horizontal = TRUE, col='pink')
Now, to create a comparative boxplot of two datasets, we will create a "dataframe" (2d array with two dataset values). Then, we will create a boxplot from this "joined" dataset.
PTSD <- c(10, 20, 25, 28, 31, 35, 37, 38, 38, 39, 39, 42, 46)
Healthy <- c(23, 39, 40, 41, 43, 47, 51, 58, 63, 66, 67, 69, 72)
df <- data.frame(Healthy, PTSD)
boxplot(df, horizontal = TRUE)
Syntax explanation
head(): directly see how the dataset looks; useful when the dataset is large and it’s difficult to display all rows and columns together.
fivenum(): returns the minimum value, lower fourth, median, upper fourth, and maximum value
boxplot(): visualizes the five number summary plus outliers. (It’s clear that the ToothGrowth data is not skewed.)
stem(): compares the number of data points that fall in different bins. (Here we can see that most values are between 20 and 29.)
hist(): draws a histogram – values are grouped in bins
cumsum(): takes a vector and returns the cumulative sums