kbenoit / sophistication

R package associated with Benoit, Munger and Spirling (2017) paper(s)
42 stars 7 forks source link

Dynamic Model not accepting year argument #7

Closed kmunger closed 6 years ago

kmunger commented 6 years ago

The results are the same regardless of how the year argument is specified.

Example:

##plot results for SOTU data--figures 2 and 3
library(quanteda.corpora)
library(sophistication)
library(dplyr)
library(quanteda)
require(stringr)
require(data.table)
library(ggplot2)

setwd("C:/Users/kevin/Documents/GitHub/sophistication-papers/")

##load BT model
load("analysis_article/AJPS_replication/data/fitted_BT_model.Rdata")

##calculate year of speeches
year<-lubridate::year(docvars(data_corpus_sotu, "Date"))

##generate continuous scores for SOTU model -- chose dynamic or static

###static
results_static<-predict_readability(BT_best, newdata = data_corpus_sotu, bootstrap_n = 10, verbose = T, 
                             baseline_year = 2000)

static<-results_static$prob 

###dynamic                            
results_dynamic<-predict_readability(BT_best, newdata = data_corpus_sotu, bootstrap_n = 10, verbose = T, 
                             baseline_year = year)

dynamic<-results_dynamic$prob 

static-dynamic
kbenoit commented 6 years ago

Not the same, just very close. I turns out the that the minimum google frequency is not a very strong determinant of the probabilities, and that the minimums are not very different.

load("analysis_article/AJPS_replication/data/fitted_BT_model.Rdata")

data(data_corpus_sotu, package = "quanteda.corpora")
# subset to make the tagging faster in this example
data_corpus_sotu <- corpus_subset(data_corpus_sotu, year < 1800)

predict_readability(BT_best, newdata = data_corpus_sotu, 
                    verbose = TRUE, baseline_year = 1800)
# Starting predict_readability (sophistication v0.65)...
#    ...using BT_best as fitted BT model; data_corpus_sotu as newdata
#    ...tagging parts of speech
#    ...computing word lengths in characters
#    ...computing baselines from Google frequencies
#    ...aggregating to sentence level
#    ...computing predicted values
#    ...finished; elapsed time: 4.51 seconds.
#                     lambda      prob     scaled
# Washington-1790  -3.775170 0.1681448   5.345790
# Washington-1790b -4.368282 0.1004762 -29.767683
# Washington-1791  -4.079174 0.1297877 -12.651905
# Washington-1792  -3.800474 0.1646351   3.847735
# Washington-1793  -3.718028 0.1762895   8.728679
# Washington-1794  -3.885695 0.1532469  -1.197559
# Washington-1795  -3.987394 0.1405104  -7.218330
# Washington-1796  -3.819777 0.1619975   2.704939
# Adams-1797       -3.794218 0.1654973   4.218099
# Adams-1798       -4.126730 0.1245105 -15.467289
# Adams-1799       -4.196062 0.1171474 -19.571915

predict_readability(BT_best, newdata = data_corpus_sotu, 
                    verbose = TRUE, baseline_year = 2000)
# Starting predict_readability (sophistication v0.65)...
#    ...using BT_best as fitted BT model; data_corpus_sotu as newdata
#    ...tagging parts of speech
#    ...computing word lengths in characters
#    ...computing baselines from Google frequencies
#    ...aggregating to sentence level
#    ...computing predicted values
#    ...finished; elapsed time: 4.92 seconds.
#                     lambda      prob     scaled
# Washington-1790  -3.775167 0.1681452   5.345948
# Washington-1790b -4.368279 0.1004764 -29.767524
# Washington-1791  -4.079171 0.1297880 -12.651747
# Washington-1792  -3.800471 0.1646355   3.847894
# Washington-1793  -3.718026 0.1762899   8.728837
# Washington-1794  -3.885693 0.1532473  -1.197400
# Washington-1795  -3.987391 0.1405107  -7.218171
# Washington-1796  -3.819774 0.1619979   2.705098
# Adams-1797       -3.794215 0.1654977   4.218258
# Adams-1798       -4.126727 0.1245108 -15.467131
# Adams-1799       -4.196059 0.1171477 -19.571756
kbenoit commented 6 years ago

Update: In 465247d I fixed a bug, and the new results look like this:

library("quanteda")
load("analysis_article/AJPS_replication/data/fitted_BT_model.Rdata")

data(data_corpus_sotu, package = "quanteda.corpora")
# subset to make the tagging faster in this example
data_corpus_sotu <- corpus_subset(data_corpus_sotu, Date < "1800-01-01")

predict_readability(BT_best, newdata = data_corpus_sotu, 
                    verbose = TRUE, baseline_year = 1800)
# Starting predict_readability (sophistication v0.65)...
# ...using BT_best as fitted BT model; data_corpus_sotu as newdata
# ...tagging parts of speech
# ...computing word lengths in characters
# ...computing baselines from Google frequencies
# ...aggregating to sentence level
# ...computing predicted values
# ...finished; elapsed time: 5.12 seconds.
# lambda      prob     scaled
# Washington-1790  -3.775170 0.1681448   5.345790
# Washington-1790b -4.368282 0.1004762 -29.767683
# Washington-1791  -4.079174 0.1297877 -12.651905
# Washington-1792  -3.800474 0.1646351   3.847735
# Washington-1793  -3.718028 0.1762895   8.728679
# Washington-1794  -3.885695 0.1532469  -1.197559
# Washington-1795  -3.987394 0.1405104  -7.218330
# Washington-1796  -3.819777 0.1619975   2.704939
# Adams-1797       -3.794218 0.1654973   4.218099
# Adams-1798       -4.126730 0.1245105 -15.467289
# Adams-1799       -4.196062 0.1171474 -19.571915

predict_readability(BT_best, newdata = data_corpus_sotu, 
                    verbose = TRUE, baseline_year = 2000)
# Starting predict_readability (sophistication v0.65)...
# ...using BT_best as fitted BT model; data_corpus_sotu as newdata
# ...tagging parts of speech
# ...computing word lengths in characters
# ...computing baselines from Google frequencies
# ...aggregating to sentence level
# ...computing predicted values
# ...finished; elapsed time: 5.63 seconds.
# lambda      prob     scaled
# Washington-1790  -3.775157 0.1681466   5.346556
# Washington-1790b -4.367517 0.1005453 -29.722386
# Washington-1791  -4.079152 0.1297902 -12.650570
# Washington-1792  -3.800471 0.1646355   3.847894
# Washington-1793  -3.718006 0.1762928   8.730014
# Washington-1794  -3.885683 0.1532486  -1.196825
# Washington-1795  -3.987376 0.1405126  -7.217279
# Washington-1796  -3.819767 0.1619988   2.705528
# Adams-1797       -3.794069 0.1655179   4.226937
# Adams-1798       -4.126648 0.1245194 -15.462451
# Adams-1799       -4.195844 0.1171700 -19.558973