stephbuon / democracy-lab

Code, manuals, and concepts for Democracy Lab research and affiliate projects.
MIT License
0 stars 0 forks source link

Create Line Chart of "Ignorant Woman" Over Time #169

Closed stephbuon closed 1 year ago

stephbuon commented 1 year ago

Steps:

  1. Steph will help you set up your code environment on M2.

This will include installing necessary packages:

install.packages("data.table")
install.packages("tidyverse")
install.packages("reticulate")

And:

install.packages("devtools")
require(devtools)
install_github("stephbuon/posextractr")
  1. After set up, use the following code to count the number of times the adjective-noun pair "ignorant woman" is stated decade-by-decade.
library(data.table)
library(tidyverse)
library(posextractr)
library(reticulate)

posextract_initialize()

hansard <- fread("/scratch/group/pract-txt-mine/hansard_justnine_12192019.csv") # read in the data

hansard <- hansard %>% # to make the data set smaller and easier to process, keep just the fields we need
select(sentence_id, text, speechdate)

hansard <- hansard %>% # only keep sentences with the word woman or women 
  filter(str_detect(text, regex("woman|women", ignore_case = T)))

hansard <- hansard %>% # create a field for year so we can find decade
  mutate(year = year(hansard_sample$speechdate))

hansard <- hansard %>% # create a field for decade so we can visualize the data by decade
mutate(decade = year - year %% 10)

hansard <- hansard %>% # get rid of the fields we no longer need
select(-speechdate, -year)

adjective_noun_pairs <- extract_adj_noun_pairs(hansard$text) # extract adjective-noun pairs

adjective_noun_pairs <- adjective_noun_pairs %>%
  mutate(adj_noun_pair = paste(adjective, noun))

adjective_noun_pairs$adj_noun_pair <- str_to_lower(adjective_noun_pairs$adj_noun_pair) # for standardization, transform all pairs to lower case. 

adjective_noun_pairs$adj_noun_pair <- str_replace(adjective_noun_pairs$adj_noun_pair, "women", "woman") # replace women with woman

adjective_noun_pairs <- adjective_noun_pairs %>% # keep just the pair we need 
  filter(str_detect(adj_noun_pair, "ignorant woman"))

adjective_noun_pairs <- adjective_noun_pairs %>% # count the pair by decade
count(decade, adj_noun_pair)
  1. Now visualize the count using a line chart. Look up line chart code like so: "R plot line chart ggplot" or "R plot line chart plotly"

People often think ggplot is easier.

  1. Email the image to Dr. Guldi.

"Dear Dr. Guldi...

I included the word "lemma" because we lemmatized women to woman."

stephbuon commented 1 year ago

@HaileyHazen : please email the image to Dr. Guldi by the end of next week (by Sept. 30).

stephbuon commented 1 year ago

@HaileyHazen -- I needed to finish this task early, so it is no longer your responsibility.

But please feel free to play around with this code so you an get a feel for R.