yixinsun1216 / crossfit

Implementation of Double/Debiased Machine Learning approach
3 stars 2 forks source link

Implementing Poisson Method That Doesn't Use the Concentrating-Out Approach to DML #7

Closed yixinsun1216 closed 3 years ago

yixinsun1216 commented 4 years ago

The DML paper gives two ways to calculate the Neyman orthogonal scores. In our first attempt, we use the concentrating out approach for both the linear and poisson models. However, the concentrating out approach only works for poisson if the variable of interest is an indicator variable. So now we want to use the original method of constructing moments, so that our DML function can be used for any type of treatment variable.

Note details on Stata's version of the implementation can be found here

Constructing the Neyman orthogonal scores for Poisson:

Likelihood = Y(D*theta + X*beta) - exp(D*theta + X*beta)
dl/dtheta = Y*D - exp(D*theta + X*beta)*D
dl/dbeta = Y*X - exp(D*theta = X*beta)*X

psi = Y*D - exp(D*theta + X*beta)D - mu*(YX - exp(D*theta + X*beta)*X)

Solve for mu:
J_{theta beta} - mu*J_{beta beta} = 0
mu = E[D'exp(D*theta + X*beta)X]E[X'exp(D*theta + X*beta)X]^{-1}

Implementing in R

  1. Partition the dataset into 5 folds
  2. For each fold, use the training dataset to run a full poisson lasso of Y ~ D*theta + X*beta. This gives us our estimates of theta_hat and beta_hat
  3. Using the estimated theta_hat and beta_hat and training dataset, compute mu_hat
  4. Use this mu_hat to minimize psi from above.
    • psi: (1/N)sum_{i = 1}{N}(y_i - exp(d_i*theta' + x_i*beta'))(d_i - x_i*mu_hat')'
    • gradient: -(1/N)sum_{i = 1}{N}(exp(d_i*theta' + x_i*beta')(d_i - x_i*mu_hat')'*d
  5. Average the resulting theta from the 5 folds
  6. s^2 = (J_0^{-1})E[psi*psi'](J_0^{-1}), where J_0 is the gradient of psi wrt theta
  7. Run this whole process n times and find the median theta value as the final estimate

Questions

** DML cookbook for constructing scores

image

tcovert commented 4 years ago

hi @yixinsun1216 finally got to looking at this. I think steps 1-3 (and your score computations) in the "implementing in R" above are correct, but I think in step 4, you want to make sure beta is fixed at the value you solved for in step 2, right? beta is part of the nuisance parameter set, and the goal is to estimate that on one part of the splits, and estimate theta based on that value from the other parts, right?

tcovert commented 4 years ago

to answer your questions:

yixinsun1216 commented 3 years ago

@tcovert I wrote up the rest of the procedures for implementing the two different approaches (finite vs concentrate out) and using lasso vs regression forest. Let me know if you see anything wrong in the implementation steps.

Remaining issues:

  1. When I run the poisson lasso using the finite nuisance parameter method for the DBOEPerAcre estimation, it can only implement with poly_degree =1. Anything higher throws an error. Code below if you want to see how it fails
    
    library(dplyr)
    library(crossfit)
    load("C:/Users/Yixin Sun/Dropbox (Personal)/EPIC/texas/generated_data/final_leases.Rda")

reg_data <- final_leases %>% filter(InSample) %>% mutate(Private = NParcels15 > 0 | Type == "RAL")

dboe_formula <- paste("DBOEPerAcre", "Auction", sep = " ~ ") %>% paste("Term + RoyaltyRate + Acres + CentLat + CentLong + EffDate", sep = " | ") %>% as.formula

dboe_data <- filter(reg_data, !Censored)

dboe_p_lasso <- dml(dboe_formula, dboe_data, model = "poisson", n = 11, worker = 4, dml_seed = NULL, ml = "lasso", poly_degree = 3, drop_na = FALSE, score = "finite") print(summary(dboe_p_lasso))



2. Thinking more how you can't get the nuisance parameters using regression forest, I definitely don't think I'm implementing the finite nuisance parameter method correctly with RF. My current process is:
   1. Use lm/glm and run y on D and X -> use this to get s = X*B. 
   2. Use regression forest to find weights in the case of Poisson
   3. Use regression forest of D on X to find m = E[D|X], using weights from step 2. Then plug s and m into the moment and continue as usual

   Maybe you were right that we should keep it simple, and we can just use lasso for the finite nuisance parameter approach and user can choose between lasso or rf for the concentrating out approach. Otherwise not really sure how to implement this - what do you think?

3. For `rlasso`, have you figured out how to force `D` to be included in the lasso
You were probably right that we could just drop `rlasso` now since it only has limited use to us - what do you think?
tcovert commented 3 years ago

Regarding 2: I'm pretty sure there is no kosher way to do FN with RF. The reason is that using the RF regression technology, the nuisance parameter is infinite dimensional, by construction: there is no finite parameter $\beta$ that a random forest spits out after you run it.

On top of this, the FN approach requires a first step calculation of the "beta" parameter in an lm or glm: the part of Y that is explained by X, holding D fixed. This is reasonably well defined in a regression technology that actually gives you a finite parameter as an answer (i.e., LASSO), because you can just give lasso a separate set of columns for D and a separate set for X and whatever interactions you want in it. There's no way to get this separate "additive" structure in RF: in every branch of every tree in an RF, the effects of D and X are always tangled up together.