p_Q3 = ggplot(data=moma.curator1, aes(x=as.numeric(month), y=Total, color=CuratorApproved))
p_Q3 = p + geom_line()
p_Q3 = p + labs(title="Number of paintings in MOMA since 1929", x="Date", y="Number of Paintings")
p_Q3
Question 4
We create a new dataframe named "moma_departement" by filtering the original
data such that we only have paintings that are registered to a departement.
We againg use the group_by function to first summarise the amount of paintings
acquired for each month by department. Then we group the data by department
We join the data by iso2c using the left_join function
moma_map = left_join(moma_birthplace,map)
We plot the data coloured by the log of the sum of paintings from each country.
p = ggplot(moma_map, aes(x = long, y = lat, group = group, fill = log(n)))
p + geom_polygon() + scale_fill_continuous(low="thistle", high="blue", guide="colorbar", na.value="grey") +
expand_limits(x = moma_map$long, y = moma_map$lat) +
labs(title = "MOMA's Stock of Paintings: Nationality of Author - World Map (log)")
library("readr") library("lubridate") library("zoo") library("dplyr") library("ggplot2") library("stringr") library("countrycode") library("maps")
rm(list=ls())
Read the data
moma = read_csv("https://raw.githubusercontent.com/MuseumofModernArt/collection/master/Artworks.csv")
Question 1
We use the as.yearmon from the "zoo" package to represent "DateAcquired"
as monthly data and convert it to "date" class using as.Date.
Then we create a variable "stock" that counts the amount of using the
count function from the dplyr package.
moma$month = as.Date(as.yearmon(moma$DateAcquired))
Now we define a new dataframe called moma.paintings. We use the dplyr package
to filter the data such that we only have the "Painting" Classification
and observations with month acquired data available. By first using the
group_by function to group the data in months, we can summarise the number of
observations in each month. Finally, we use the mutate function to create
a variable named "Total" that accumulates the number of paintings.
moma.paintings = moma %>% filter(!is.na(moma$month), Classification == "Painting") %>% group_by(month) %>% summarise(PaintingsAcquired=n()) %>% mutate(Total = cumsum(PaintingsAcquired))
Question 2
We plot the data using a geom_line since it is appropriate for
illustrating the evolution of the stock of paintings.
p_Q1 = ggplot(data = moma.paintings, aes(x = month, y = Total )) p_Q1 + geom_line(color = "Red") + xlab("Date") + ylab("Number of Paintings") + ggtitle("MoMa Stock of Paintings Since 1929") + theme_minimal()
Question 3
We define a new dataframe moma.curator which contains all the paintings
that has a date acquired observation and group them by month and currator
approval. We sum the amount of paintings in these groups acquired every month.
moma.curator <- moma %>% filter(!is.na(DateAcquired), Classification == "Painting") %>% group_by(month,CuratorApproved) %>% summarise(PaintingsAcquired=n())
We then move on to group the data by Curator Approval and sum the amount of
paintings in each category. "Approved" or "not".
moma.curator1 <- moma.curator %>% group_by(CuratorApproved) %>% mutate(Total = cumsum(PaintingsAcquired ))
We now plot the data
p_Q3 = ggplot(data=moma.curator1, aes(x=as.numeric(month), y=Total, color=CuratorApproved)) p_Q3 = p + geom_line() p_Q3 = p + labs(title="Number of paintings in MOMA since 1929", x="Date", y="Number of Paintings") p_Q3
Question 4
We create a new dataframe named "moma_departement" by filtering the original
data such that we only have paintings that are registered to a departement.
We againg use the group_by function to first summarise the amount of paintings
acquired for each month by department. Then we group the data by department
and sum the observations using cumsum.
moma_department <- moma %>% filter(!is.na(Department), !is.na(month),Classification=="Painting") %>% group_by(month,Department) %>% summarise(n=n()) %>% group_by(Department) %>% mutate(stock = cumsum(n))
Question 5
The plot shows that Department "Painting and Sculpture" has almost all paintings.
p_Q5 <- ggplot(data=moma_department, aes(x=month,y=stock, color=Department)) p_Q5 <- p_Q5 + geom_line() + scale_y_continuous("Stock of paintings") + scale_x_date("") p_Q5 <- p_Q5 + theme_minimal()+ggtitle("MoMA's paintings since 1929 : By department") p_Q5
Question 6
We create a dataframe named moma_painters that counts the number of paintings in
the moma stock by each artist. First we filter the NA observations out and
make sure we are only dealing with paintings. Then we group the observations
by "Artist" and summarise the amount of works by each artist.
We print the ten artists with the most works by using the "head" function.
moma_painters <- moma %>% filter(!is.na(Artist),Classification== "Painting") %>% group_by(Artist) %>% summarise(n=n()) %>% arrange(-n)
head(moma_painters, n=10)
Question 7
First we create a dataset named moma_birthplace in which we remove
the observations with no "Artist" or "ArtistBio" and remove all non-paintings.
We select only the variables "Artist" and "ArtistBio"
moma_birthplace = moma %>% filter(Artist!="", ArtistBio!="", Classification == "Painting") %>% count(Artist, ArtistBio)
We define a new variable in which we remove the parenthesis' using gsub.
moma_birthplace$Bio = gsub("(|)", "", moma_birthplace$ArtistBio)
Then we extract only the part of the new variable "Bio" that is the first
word beginning with a capital letter followed by low case letters,
since this is how the nationalities are stated in "ArtistBio".
Then we filter out the observations that are not nationalities
moma_birthplace$Bio = str_extract(moma_birthplace$Bio, "[A-Z].[a-z]+") moma_birthplace = moma_birthplace %>% filter(Bio != "Nationality" & Bio != "Various")
We create a dataset that summarises all the nationalities to get an overview.
countries <- moma_birthplace %>% group_by(Bio) %>% summarise(n =n())
We then create a character vector with all the 53 nationalities in our data..
Nationality <- c( "American" , "Argentine", "Australian", "Austrian", "Belgian", "Bolivian", "Brazilian", "British", "Canadian", "Chilean", "Colombian", "Congolese", "Croatian", "Cuban",
"Czech", "Danish", "Dutch", "French", "German", "Ghanaian", "Great",
"Guatemalan", "Guyanese", "Haitian", "Hungarian", "Icelandic", "Indian", "Iranian",
"Irish", "Israeli", "Italian", "Japanese", "Korean", "Mexican", "Moroccan",
"Nicaraguan", "Norwegian", "Peruvian", "Polish", "Romanian", "Russian", "South",
"Spanish", "Sudanese", "Swedish", "Swiss", "Tanzanian", "Turkish", "Ukrainian", "Uruguayan", "Venezuelan", "Yugoslav", "Zimbabwean")
..aswell as a character vector consisting of all matching "Country names"
Nation = c( "US" , "Argentina", "Australia", "Austria", "Belgium", "Bolivia", "Brazil", "Britain", "Canada", "Chile", "Colombia", "Congo", "Croatia", "Cuba",
"Czech Republic", "Denmark", "Netherlands", "France", "Germany", "Ghana", "Great Britain",
"Guatemala", "Guyana", "Haiti", "Hungary", "Iceland", "India", "Iran",
"Ireland", "Israel", "Italy", "Japan", "Korea", "Mexico", "Morocco",
"Nicaragua", "Norway", "Peru", "Poland", "Romania", "Russia", "South Africa",
"Spain", "Sudan", "Sweden", "Switzerland", "Tanzania", "Turkey", "Ukraine", "Uruguay", "Venezuela", "Yugoslavia", "Zimbabwe")
We transform the "Bio" variable in moma_birthplace into the right Country name
using a loop.
for (i in 1:53){ moma_birthplace$Bio[moma_birthplace$Bio == Nationality[i]] = Nation[i] }
We then add an iso2c variable to moma_birthplace using the "countrycode"
moma_birthplace$iso2c = countrycode(moma_birthplace$Bio, origin= "country.name", destination = "iso2c")
moma_birthplace = moma_birthplace %>% count(iso2c)
We then prepare to plot the data on a world map by creating a dataframe "map"
which contains world coordinates and Country names, which we convert into
iso2c to merge with our data.
map = map_data("world") map$iso2c = countrycode(map$region, origin= "country.name", destination = "iso2c")
We join the data by iso2c using the left_join function
moma_map = left_join(moma_birthplace,map)
We plot the data coloured by the log of the sum of paintings from each country.
p = ggplot(moma_map, aes(x = long, y = lat, group = group, fill = log(n))) p + geom_polygon() + scale_fill_continuous(low="thistle", high="blue", guide="colorbar", na.value="grey") + expand_limits(x = moma_map$long, y = moma_map$lat) + labs(title = "MOMA's Stock of Paintings: Nationality of Author - World Map (log)")
Question 8
We first make the width variable in our dataset
moma$width <- str_extract(moma$Dimensions," ([0-9].+ *x") moma$width <- gsub("x","",moma$width) moma$width <- as.numeric(gsub("(","",moma$width))
Then we extract the height of each painting
moma$height <- str_extract(moma$Dimensions," ([prod0-9].+ cm") moma$height <- gsub("([0-9].+ x","",moma$height) moma$height <- as.numeric(gsub("cm","",moma$height))
Finally we calculate the surface area of each painting by using the mutate
function to take the product of height and width.
moma <- mutate(moma, area= height*width)