odelmarcelle / sentopics

GNU General Public License v3.0
6 stars 2 forks source link

Replicate Picault & Renault (2017) #3

Open almiggggg opened 1 year ago

almiggggg commented 1 year ago

Reproduce Picault & Renault (2017)

I am attempting to replicate the findings of Picault & Renault (2017) by using the compute_PicaultRenault_scores function. However, I am having difficulty replicating Figure 6 on page 24, as the values for the MC are too low. I wanted to replicate Figure 6 on their page 24. As far as I am concerned this should work in the following way:

docs <- ECB_press_conferences %>% quanteda::corpus_subset(.date >= "2000-01-01") %>% 
  quanteda::corpus_reshape("documents")
docvars(docs, c("MP", "EC")) <- compute_PicaultRenault_scores(docs, min_ngram = 2, return_dfm = F)
data <- as.data.frame(PicaultRenault_data)
data$date <- head(docs$.date, -3)
docs <- docs %>% quanteda::corpus_subset(.date >= "2006-01-01" & .date <= "2014-12-31")
data <- data[data$date >= "2006-01-01" & data$date <= "2014-12-31", ]

length(data$date) #not 106 (?)

data$MP <- docs$MP 
data$EC <- docs$EC 

reg1 <- lm(R_t ~ Surprise + `R_t-1` + MP + EC ,data = data)
summary(reg1)
length(data$MRR_t)
plot( 1:length(docs$MP),docs$MP, 'l')

The plot does follow the form of Figure 2, however, the values are incorrect. Futhermore, the regression output I get is:

> summary(reg1)

Call:
lm(formula = R_t ~ Surprise + `R_t-1` + MP + EC, data = data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.050089 -0.006691  0.001495  0.007435  0.047775 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -8.624e-05  1.609e-03  -0.054  0.95736   
Surprise     4.243e-02  2.160e-02   1.964  0.05223 . 
`R_t-1`      2.158e-01  1.132e-01   1.906  0.05945 . 
MP          -1.236e-02  4.654e-03  -2.656  0.00918 **
EC           1.194e-02  7.558e-03   1.579  0.11731   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01609 on 103 degrees of freedom
  (1 Beobachtung als fehlend gelöscht)
Multiple R-squared:  0.1382,    Adjusted R-squared:  0.1047 
F-statistic: 4.129 on 4 and 103 DF,  p-value: 0.00383

The results I'm getting are quite close to what I was expecting, but the statistics are slightly off. Additionally, I have 109 observations instead of the 106. When using robust stanard erros the sig decreases even more.

I also found a small deviation between my webscraping and yours: I did not identify the Press Briefing 2007-08-02 as a press conference. I think you take this https://www.ecb.europa.eu/press/pressconf/2007/html/is070802.en.html as a press conference. Not sure if this is really true.

I wanted to post this question here because I don't think I'll be able to find a more detailed answer anywhere else. If this is not the right place for this, please let me know where I should post it.

Best regards, Marco A

odelmarcelle commented 1 year ago

Hello Marco,

You're correct, and I also noticed those small differences between the original paper and my replication. I thought they originated from the difference in sample size (109 vs 106), which I never fully understood.

Thanks to your comment, I now realize that my webscraping approach might have been wrong. I based my webscraping on a yearly index of press releases (e.g., https://www.ecb.europa.eu/press/pressconf/2014/html/index_include.en.html), which include, for 2007, the 2007-08-02 Press Briefing.

I'm not sure about the status of the 2007-08-02 Press Briefing, but I noticed that I might incorrectly take into account the press release from 26 October 2014. (https://www.ecb.europa.eu/press/pressconf/2014/html/is141026.en.html). This one clearly does not appear when browsing the website of the ECB. image

I will look into the webscraping and see if I can correct it. I would also like to align this replication to the original paper fully, but I have had little time to spend on this project for the past year, so I can't guarantee I can solve it soon. Please let me know if, on your side, you achieve a better replication 😃

almiggggg commented 1 year ago

I do it the same way. In my code I do something like this: final_links_ecb_QA <- final_links_ecb_QA[grepl("Q&A|Introductory statement", final_links_ecb_QA$title),] to only select the actual press conferences!

Interestingly, the replica results do not change that much.

I will send you an email :) best Marco