Open woneuy01 opened 4 years ago
Describing Categories Counting appearance Creating a table
amtable <- table(cars$am) amtable
auto manual
13 19 amtable / sum(amtable) auto manual 0.40625 0.59375 sapply(mtcars, function(x) length(unique(x))) # low numbers can convert to factors
Describing Distributions plot histogram
hist(cars$mpg, col = "grey") R and you want to have bars representing the intervals 5 to 15, 15 to 25, and 25 to 35, hist(cars$mpg, breaks = c(5, 15, 25, 35))
By breaking up your data in intervals, you still lose some information, Still, the most complete way of describing your data is by estimating the probability density function (PDF) or density of your variable. mpgdens <- density(cars$mpg) plot(mpgdens)
Plotting densities in a histogram
hist(cars$mpg, col = "grey", freq = FALSE) lines(mpgdens)
Describing Multiple Variables Summarizing a complete dataset
summary(cars)
mpg cyl am gear
Min. :10.40 Min. :4.000 1st Qu.:15.43 1st Qu.:4.000 manual:19 4:12 Median
Plotting quantiles for subgroups One way to quickly compare groups is to construct a box‐and‐whisker plot from the data "plot boxes for the variable mpg for the groups defined by the variale cyl"
boxplot(mpg ~ cyl, data = cars)
Correlations plot(iris[-5]) pairs() #create plot matrix
with(iris, cor(Petal.Width, Patal.Length)) [1]0.9628654 iris.cor <- cor(iris[-5]) str(iris.cor) iris.cor["Patal.Width","Petal.Length"] [1] 0.9628654
sort table sort(table(x))
describing data : continuous variables
str(cars) mean(cars$mpq) median(cars$cyl) varation <- sd(cars$mpq) range(cars$mpq)