kolesarm / RDHonest

Honest inference in regression discontinuity designs
54 stars 15 forks source link

Covs in RDHonest? #7

Closed njudd closed 6 months ago

njudd commented 8 months ago

It seems to be unclear from the documentation if RDHonest supports additional covariates (e.g., the covs arg in rdrobust)? I would like to use RDHonest since my running variable is discrete, yet in my application, it will help greatly with precision. Wondering if this is feasible!

Thanks for any insight & if this is possible in RDHonest, Nick

njudd commented 7 months ago

This method seems to not change the width much, yet did make the lower CI further away from zero.

library(RDHonest)

# data cleaning from rdrobust 
df<-read.csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/R/rdrobust_senate.csv")
df <- df[complete.cases(df),] # removing NA's
df$dummy_win <- df$margin > 0 #making a dummy treatment var
df$margin <- round(df$margin) #making the running var discrete

m1 <- RDHonest(vote~margin, data = df); m1 # RDHonest no covs, estimating M, estimating Bandwidth

# Step 1 estimating the effects of the covs
correction_mod <- lm(vote ~ 1 + margin + dummy_win + margin:dummy_win + class + termshouse + termssenate, data = df)

# Step 2 making a new outcome with their effects out
df$vote_adj <- df$vote - (correction_mod$coefficients[4]*df$class + 
                             correction_mod$coefficients[5]*df$termshouse + 
                             correction_mod$coefficients[6]*df$termshouse)

# Step 3 run RDHonest as normal with Y adjusted
m2 <- RDHonest(vote_adj ~ margin, data = df); m2 # RDHonest no covs, estimating M, estimating Bandwidth

# checking the result & how inferences change
m1_CI_length <- as.numeric(m1$coefficients[6])-as.numeric(m1$coefficients[5])
m2_CI_length <- as.numeric(m2$coefficients[6])-as.numeric(m2$coefficients[5])

m1_CI_length; m2_CI_length #almost the same CI length

m1_lower<- as.numeric(m1$coefficients[5])
m2_lower<- as.numeric(m2$coefficients[5])

m1_lower; m2_lower #yet... the lower one is further away from zero
kolesarm commented 7 months ago

[ edited to fix typos pointed out below ]

Currently covariate adjustment is not implemented, but we plan to add it -- in fact the reason we have not posted the package at CRAN is that we wanted to add the covariate adjustment functionality first.

Your implementation is basically right, but the covariate-adjustment should be local, so use bandwidth from RDHonest with no covs -- a global covariate adjustment may be noisy enough so that you get little to no improvement.

Here is an implementation, but with local covariate adjustment. You can see that it indeed shrinks the CIs:

library(RDHonest)

df <- read.csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/R/rdrobust_senate.csv")
df <- df[complete.cases(df), ]

df$margin <- round(df$margin) # make the running var discrete
rd1 <- RDHonest(vote~margin, data = df) # no covs, estimate M + bandwidth
print(rd1)

## Step 1 Estimate the effects of the covs, use the same bandwidth as above
covs <- c("class", "termshouse", "termssenate")
f1 <- paste(c("vote ~ margin*I(margin>0)", covs), collapse=" + ")
r1 <- lm(f1, data = df, subset = (abs(margin) <= rd1$coefficients$bandwidth))

## Step 2 Create covariate-adjusted outcome
df$vote_adj <- drop(df$vote-as.matrix(df[, covs]) %*% r1$coefficients[covs])
# Step 3 run RDHonest as normal with Y adjusted, keep same M as above
rd2 <- RDHonest(vote_adj~margin, data = df, M=rd1$coefficients$M)
print(rd2)

This indeed shrinks the CIs:

> rd2$coefficients$conf.high-rd2$coefficients$conf.low
[1] 6.87717
> rd1$coefficients$conf.high-rd1$coefficients$conf.low
[1] 7.03419
njudd commented 7 months ago

Just a couple of edits:

  1. missed the other bandwidth
    
    r1 <- lm(f1, data = df, subset = (margin <= rd1$coefficients$bandwidth))

r1 <- lm(f1, data = df, subset = (margin <= rd1$coefficients$bandwidth & margin >= -rd1$coefficients$bandwidth))

2. missed taking M from the first mod

rd2 <- RDHonest(vote_adj~margin, data = df)

rd2 <- RDHonest(vote_adj~margin, data = df, M = rd1$coefficients$M)

njudd commented 7 months ago

Fuzzy RD cov control

I am just ignoring the first stage when making the Y_adjusted variable, not sure if this is correct? I run RDHonest 3 times, first to get T0 and then the two M values. In the final mod I use T0 from the first and M from the 2nd.

library(RDHonest)

df <- read.csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/R/rdrobust_senate.csv")
df <- df[complete.cases(df), ]
df$margin <- round(df$margin) # make the running var discrete

# making an semi-probablistic instrument winning (not much effort put in)
df$winning <- as.numeric(df$margin >= 0)
set.seed(42)
df$winning[df$margin >= -3 & df$margin <= 2][1:40] <- sample(c(0,1), 40, replace = TRUE)

rd0 <- RDHonest(vote~ winning | margin, data = df) #estimate T0, as seen in the tutorial
rd1 <- RDHonest(vote~ winning |margin, data = df, 
                T0 = rd0$coefficients$estimate) # no covs, estimate M + bandwidth
print(rd1)

## Step 1 Estimate the effects of the covs, use the same bandwidth as above
covs <- c("class", "termshouse", "termssenate")
f1 <- paste(c("vote ~ margin*I(margin>0)", covs), collapse=" + ")

# ignoring the first stage variable "winning"
r1 <- lm(f1, data = df, subset = (margin <= rd1$coefficients$bandwidth & margin >= -rd1$coefficients$bandwidth))

## Step 2 Create covariate-adjusted outcome
df$vote_adj <- drop(df$vote-as.matrix(df[, covs]) %*% r1$coefficients[covs])

# Step 3 run RDHonest as normal with Y adjusted, keep same M as above
rd2 <- RDHonest(vote_adj~ winning | margin, data = df, 
                T0 = rd0$coefficients$estimate, #using T0 from rd0
                M = c(rd1$coefficients$M.rf, rd1$coefficients$M.fs)) # using M's from rd1

print(rd2)
kolesarm commented 6 months ago

Just a couple of edits:

1. missed the other bandwidth
r1 <- lm(f1, data = df, subset = (margin <= rd1$coefficients$bandwidth))

r1 <- lm(f1, data = df, subset = (margin <= rd1$coefficients$bandwidth & margin >= -rd1$coefficients$bandwidth))
2. missed taking M from the first mod
rd2 <- RDHonest(vote_adj~margin, data = df)

rd2 <- RDHonest(vote_adj~margin, data = df, M = rd1$coefficients$M)

Thanks edited my response

kolesarm commented 6 months ago

For fuzzy RD, you need to covariate-adjust both the outcome and the treatment---it's an IV, and covariates enter both stages of IV:

## Step 1 Estimate the effects of the covs, use the same bandwidth as above
## Do this separately for outcome and for treatment
covs <- c("class", "termshouse", "termssenate")
f1 <- paste(c("~ margin*I(margin>0)", covs), collapse=" + ")
ry <- lm(paste("vote", f1), data = df,
         subset = abs(margin) <= rd1$coefficients$bandwidth)
rd <- lm(paste("winning", f1), data = df,
         subset = abs(margin) <= rd1$coefficients$bandwidth)

## Step 2 Create covariate-adjusted outcome and treamtent
df$vote_adj <- drop(df$vote-as.matrix(df[, covs]) %*% ry$coefficients[covs])
df$winning_adj <- drop(df$winning-as.matrix(df[, covs]) %*% rd$coefficients[covs])

# Step 3 run RDHonest as normal with Y adjusted, keep same M as above
## using M's from rd1
rd2 <- RDHonest(vote_adj ~ winning_adj | margin, data = df,
                T0 = rd0$coefficients$estimate, #using T0 from rd0
                M = c(rd1$coefficients$M.rf, rd1$coefficients$M.fs))
## Estimating new M's
rd3 <- RDHonest(vote_adj ~ winning_adj | margin, data = df,
                T0 = rd0$coefficients$estimate)
kolesarm commented 6 months ago

Support for covariates now addded as of commit 82e3fd1ba2e5ba2f6c02b545b9fd3216b68ba27e

Note that the fuzzy RD syntax has changed to outcome|treatment~running | covs. Previously it was outcome~treatment|running.

df <- read.csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/R/rdrobust_senate.csv")
df <- df[complete.cases(df), ]
df$margin <- round(df$margin) # make the running var discrete

# making an semi-probablistic instrument winning (not much effort put in)
df$winning <- as.numeric(df$margin >= 0)
set.seed(42)
df$winning[df$margin >= -3 & df$margin <= 2][1:40] <- sample(c(0,1), 40, replace = TRUE)

rd0 <- RDHonest(vote | winning ~ margin, data = df) #estimate T0
rd1 <- RDHonest(vote | winning ~ margin | class + termshouse + termssenate, data = df,
                T0 = rd0$coefficients$estimate) 
## Sharp RD
 rd2 <- RDHonest(vote ~ margin | class + termshouse + termssenate, data = df)    
njudd commented 6 months ago

Seems to have the error length(formula)[1] == 1L is not TRUE possibly due to the treatment being dummy coded?

library(RDHonest)

df <- read.csv("https://raw.githubusercontent.com/rdpackages/rdrobust/master/R/rdrobust_senate.csv")
df <- df[complete.cases(df), ]
df$margin <- round(df$margin) # make the running var discrete

# making an semi-probablistic instrument winning (not much effort put in)
df$winning <- as.numeric(df$margin >= 0)
set.seed(42)
df$winning[df$margin >= -3 & df$margin <= 2][1:40] <- sample(c(0,1), 40, replace = TRUE)

# issue with outcome|treatment~running 
rd0 <- RDHonest(vote | winning ~ margin, data = df)
kolesarm commented 6 months ago

Not sure what you're referring to, the above code runs ok and rd0 returns:

Call:
RDHonest(formula = vote | winning ~ margin, data = df)

Inference for Fuzzy RD parameter (using Holder class), confidence level 95%:
        Estimate Std. Error Maximum Bias  Confidence Interval
winning  6.44098    2.73172      2.22955 (-0.295553, 13.1775)
njudd commented 6 months ago

weird, worked with a clean install might have been an outdated package dependency!