jogrue / popdictR

A package designed to measure populism in text with a German dictionary.
Creative Commons Zero v1.0 Universal
10 stars 0 forks source link

popdictR

This package contains a German-language populism dictionary and functions to apply the dictionary to text. It includes the dictionary as published in the the paper "Populist ideas on social media" in New Media & Society.

Install

This package requires my packages multidictR and regexhelpeR which should be installed before this package.

You can install everything from within R using devtools:

library(devtools)

# Install the dependency regexhelpeR from GitHub
devtools::install_github("jogrue/regexhelpeR")

# Install the multidictR package from GitHub
devtools::install_github("jogrue/multidictR")

# Install the popdictR package from GitHub
devtools::install_github("jogrue/popdictR")

Example

# My dictionary
popdictR::gruendl_terms

# All terms (also available as .ods/.csv under /data-raw)
popdictR::gruendl_dictionary_complete

# Similar to what was done in the paper ----------------------------------------

# Load packages
library(popdictR)
library(lubridate)
library(quanteda)
library(tidyverse)

# Prepare data
fbcorp <- readRDS("data/facebook.rds") %>%
  filter(date >= date("2014-01-01") & date < date("2020-02-29")) %>%
  rename(doc_id = id, text = message) %>%
fbcorp <- corpus(fbcorp)

# Run the populism dictionary on the corpus
fbresult1 <- run_popdict(fbcorp)

# Run other dictionaries on the corpus
fbresult2 <- run_other_popdicts(fbcorp, include_totals = FALSE)

# Results are the number of sentences per document that had at least one match
# with a dictionary pattern (are, supposedly, populist). Also, the total number 
# of sentences is returned in "n_sentences"

# Combine results
fbresult1 <- convert(fbresult1, to = "data.frame")
fbresult2 <- convert(fbresult2, to = "data.frame")
fbresult <- bind_cols(
  fbresult1,
  select(fbresult2, dict_rooduijn_pauwels_2011, dict_pauwels_2017)
)

# This summary groups results by country and party and then gives us the
# percentage of populist sentences per party.
summary <- fbresult %>%
  group_by(actor_country, party) %>%
  summarize(
    sentences   = sum(n_sentences),
    gruendl     = sum(dict_gruendl_2020) / sentences * 100,
    pauwels     = sum(dict_pauwels_2017) / sentences * 100,
    rooduijn    = sum(dict_rooduijn_pauwels_2011) / sentences * 100,
    popu_list   = first(popu_list)
  ) %>%
  ungroup
summary

Status

Update [2024-10-23]: Currently, the package is not working with newer versions of quanteda. It was developed using quanteda v2, so I would suggest trying to downgrade quanteda to version 2.1.2 and test it again then. I am happy to accept a Pull request if someone is able to fix it.

The package includes the dictionary as published and worked well for my particular use case (see example above). All functions are documented already. However, the package has not been tested extensively. Thus, I am glad for any feedback or issues you encounter.

Unfortunately, I am not working in academia anymore and do not really have time to support or continue developing this package. These are some of the things that could be further improved:

Cite

Gründl, J. (2020). Populist ideas on social media: A dictionary-based measurement of populist communication. New Media & Society. Advance online publication. https://doi.org/10.1177/1461444820976970

Gründl, J. (2020). popdictR (R package). https://github.com/jogrue/popdictR