woneuy01 / R-visualization

0 stars 0 forks source link

ggplot #5

Open woneuy01 opened 4 years ago

woneuy01 commented 4 years ago

One limitation is that ggplot is designed to work exclusively with data tables. In these data tables, rows have to be observations and columns have to be variables.

woneuy01 commented 4 years ago

ggplot2-cheatsheet.pdf

woneuy01 commented 4 years ago

You can associate a dataset x with a ggplot object with any of the 3 commands: ggplot(data = x) ggplot(x) x %>% ggplot()

woneuy01 commented 4 years ago

In ggplot2, graphs are created by adding layers to the ggplot object: DATA %>% ggplot() + LAYER_1 + LAYER_2 + ... + LAYER_N The geometry layer defines the plot type and takes the format geom_X where X is the plot type. Aesthetic mappings describe how properties of the data connect with features of the graph (axis position, color, size, etc.) Define aesthetic mappings with the aes function. aes uses variable names from the object component (for example, total rather than murders$total). geom_point creates a scatterplot and requires x and y aesthetic mappings. geom_text and geom_label add text to a scatterplot and require x, y, and label aesthetic mappings.

woneuy01 commented 4 years ago

Code: Adding layers to a plot library(tidyverse) library(dslabs) data(murders) murders %>% ggplot() + geom_point(aes(x = population/10^6, y = total))

add points layer to predefined ggplot object

p <- ggplot(data = murders) p + geom_point(aes(population/10^6, total))

add text layer to scatterplot

p + geom_point(aes(population/10^6, total)) + geom_text(aes(population/10^6, total, label = abb)) Rplot1

woneuy01 commented 4 years ago

change the size of the points

p + geom_point(aes(population/10^6, total), size = 3) + geom_text(aes(population/10^6, total, label = abb))

move text labels slightly to the right

p + geom_point(aes(population/10^6, total), size = 3) + geom_text(aes(population/10^6, total, label = abb), nudge_x = 1) Rplot2

woneuy01 commented 4 years ago

simplify code by adding global aesthetic

p <- murders %>% ggplot(aes(population/10^6, total, label = abb)) p + geom_point(size = 3) + geom_text(nudge_x = 1.5) since the aes defined in p globally no need to add aes in geom_point and geom_text

local aesthetics override global aesthetics (redefining, even though there is global aesthetic)

p + geom_point(size = 3) + geom_text(aes(x = 10, y = 800, label = "Hello there!"))

woneuy01 commented 4 years ago

Code: Log-scale the x- and y-axis

define p

library(tidyverse) library(dslabs) data(murders) p <- murders %>% ggplot(aes(population/10^6, total, label = abb))

log base 10 scale the x-axis and y-axis

p + geom_point(size = 3) + geom_text(nudge_x = 0.05) + scale_x_continuous(trans = "log10") + scale_y_continuous(trans = "log10")

efficient log scaling of the axes

p + geom_point(size = 3) + geom_text(nudge_x = 0.075) + scale_x_log10() + scale_y_log10() Rplot3

woneuy01 commented 4 years ago

Code: Add labels and title

p + geom_point(size = 3) + geom_text(nudge_x = 0.075) + scale_x_log10() + scale_y_log10() + xlab("Population in millions (log scale)") + ylab("Total number of murders (log scale)") + ggtitle("US Gun Murders in 2010")

5 redefine p to be everything except the points layer

p <- murders %>% ggplot(aes(population/10^6, total, label = abb)) + geom_text(nudge_x = 0.075) + scale_x_log10() + scale_y_log10() + xlab("Population in millions (log scale)") + ylab("Total number of murders (log scale)") + ggtitle("US Gun Murders in 2010")

woneuy01 commented 4 years ago

make all points blue

p + geom_point(size = 3, color = "blue") Rplot5

color points by region

p + geom_point(aes(col = region), size = 3) *col is color Rplot01

woneuy01 commented 4 years ago

Code: Add a line with average murder rate

define average murder rate

r <- murders %>% summarize(rate = sum(total) / sum(population) * 10^6) %>% pull(rate)

basic line with average murder rate for the country

p + geom_point(aes(col = region), size = 3) + geom_abline(intercept = log10(r)) # slope is default of 1

change line to dashed and dark grey, line under points

p + geom_abline(intercept = log10(r), lty = 2, color = "darkgrey") + geom_point(aes(col = region), size = 3)

Rplot02

Code: Change legend title p <- p + scale_color_discrete(name = "Region") # capitalize legend title

woneuy01 commented 4 years ago

The ggthemes package adds additional themes.

theme used for graphs in the textbook and course

library(dslabs) ds_theme_set()

themes from ggthemes

library(ggthemes) p + theme_economist() # style of the Economist magazine p + theme_fivethirtyeight() # style of the FiveThirtyEight website

woneuy01 commented 4 years ago

Code: Putting it all together to assemble the plot

load libraries

library(tidyverse) library(ggrepel) library(ggthemes) library(dslabs) data(murders)

define the intercept

r <- murders %>% summarize(rate = sum(total) / sum(population) * 10^6) %>% .$rate

make the plot, combining all elements

murders %>% ggplot(aes(population/10^6, total, label = abb)) + geom_abline(intercept = log10(r), lty = 2, color = "darkgrey") + geom_point(aes(col = region), size = 3) + geom_text_repel() + scale_x_log10() + scale_y_log10() + xlab("Population in millions (log scale)") + ylab("Total number of murders (log scale)") + ggtitle("US Gun Murders in 2010") + scale_color_discrete(name = "Region") + theme_economist() *geom_text_repel() when text were too close or crowded Rplot03

woneuy01 commented 4 years ago

geom_histogram creates a histogram. Use the binwidth argument to change the width of bins, the fill argument to change the bar fill color, and the col argument to change bar outline color.

Code: Histograms in ggplot2

load heights data

library(tidyverse) library(dslabs) data(heights)

define p

p <- heights %>% filter(sex == "Male") %>% ggplot(aes(x = height))

basic histograms

p + geom_histogram() p + geom_histogram(binwidth = 1) 1

histogram with blue fill, black outline, labels and title

p + geom_histogram(binwidth = 1, fill = "blue", col = "black") + xlab("Male heights in inches") + ggtitle("Histogram") col = "black" means the line of bar is black

2

woneuy01 commented 4 years ago

geom_density creates smooth density plots. Change the fill color of the plot with the fill argument.

p + geom_density() p + geom_density(fill = "blue") 3

woneuy01 commented 4 years ago

geom_qq creates a quantile-quantile plot. This geometry requires the sample argument. By default, the data are compared to a standard normal distribution with a mean of 0 and standard deviation of 1. This can be changed with the dparams argument, or the sample data can be scaled.

basic QQ-plot

p <- heights %>% filter(sex == "Male") %>% ggplot(aes(sample = height)) p + geom_qq() 5

QQ-plot against a normal distribution with same mean/sd as data

params <- heights %>% filter(sex == "Male") %>% summarize(mean = mean(height), sd = sd(height)) p + geom_qq(dparams = params) + geom_abline() 1

QQ-plot of scaled data against the standard normal distribution

heights %>% ggplot(aes(sample = scale(height)) + geom_qq() + geom_abline() 3

woneuy01 commented 4 years ago

histogram comparison Code: Grids of plots with the grid.extra package

define plots p1, p2, p3

p <- heights %>% filter(sex == "Male") %>% ggplot(aes(x = height)) p1 <- p + geom_histogram(binwidth = 1, fill = "blue", col = "black") p2 <- p + geom_histogram(binwidth = 2, fill = "blue", col = "black") p3 <- p + geom_histogram(binwidth = 3, fill = "blue", col = "black")

arrange plots next to each other in 1 row, 3 columns

library(gridExtra) grid.arrange(p1, p2, p3, ncol = 3) 2

woneuy01 commented 4 years ago

To create a scatter plot, we add a layer with the function geom_point. The aesthetic mappings require us to define the x-axis and y-axis variables respectively. So the code looks like this: murders %>% ggplot(aes(x = , y = )) + geom_point() except we have to fill in the blanks to define the two variables x and y

woneuy01 commented 4 years ago

library(dplyr) library(ggplot2) library(dslabs) data(murders)

edit the next line to add the label

murders %>% ggplot(aes(population, total, label = abb)) + geom_label() 4

woneuy01 commented 4 years ago

Rewrite the code above to make the labels blue by adding an argument to geom_label

murders %>% ggplot(aes(population, total,label= abb)) + geom_label(color="blue") 5

woneuy01 commented 4 years ago

Rewrite the code above to make the label color correspond to the state's region. Because this is a mapping, you will have to do this through the aes function. Use the existing aes function inside of the ggplot function.

murders %>% ggplot(aes(population, total, label = abb, color=region)) + geom_label() 6

woneuy01 commented 4 years ago

p <- murders %>% ggplot(aes(population, total, label = abb, color = region)) + geom_label()

add a layer to add title to the next line

p + scale_x_log10() + scale_y_log10() + ggtitle("Gun murder data") 7

woneuy01 commented 4 years ago

Create separate smooth density plots for males and females by defining group by sex. Use the existing aes function inside of the ggplot function.

add the group argument then a layer with +

heights %>% ggplot(aes(height,group=sex )) + geom_density() 2

woneuy01 commented 4 years ago

Change the density plots from the previous exercise to add color.

edit the next line to use color instead of group then add a density layer

heights %>% ggplot(aes(height, color = sex))+ geom_density() 1

woneuy01 commented 4 years ago

heights %>% ggplot(aes(height, fill = sex)) + geom_density()

3

woneuy01 commented 4 years ago

heights %>% ggplot(aes(height, fill = sex)) + geom_density(alpha = 0.2 )