Implementing Poisson Method That Doesn't Use the Concentrating-Out Approach to DML

yixinsun1216 commented 4 years ago

The DML paper gives two ways to calculate the Neyman orthogonal scores. In our first attempt, we use the concentrating out approach for both the linear and poisson models. However, the concentrating out approach only works for poisson if the variable of interest is an indicator variable. So now we want to use the original method of constructing moments, so that our DML function can be used for any type of treatment variable.

Note details on Stata's version of the implementation can be found here

Constructing the Neyman orthogonal scores for Poisson:

Likelihood = Y(D*theta + X*beta) - exp(D*theta + X*beta)
dl/dtheta = Y*D - exp(D*theta + X*beta)*D
dl/dbeta = Y*X - exp(D*theta = X*beta)*X

psi = Y*D - exp(D*theta + X*beta)D - mu*(YX - exp(D*theta + X*beta)*X)

Solve for mu:
J_{theta beta} - mu*J_{beta beta} = 0
mu = E[D'exp(D*theta + X*beta)X]E[X'exp(D*theta + X*beta)X]^{-1}

Implementing in R

Partition the dataset into 5 folds
For each fold, use the training dataset to run a full poisson lasso of Y ~ D*theta + X*beta. This gives us our estimates of theta_hat and beta_hat
Using the estimated theta_hat and beta_hat and training dataset, compute mu_hat
Use this mu_hat to minimize psi from above.
- psi: (1/N)sum_{i = 1}{N}(y_i - exp(d_i*theta' + x_i*beta'))(d_i - x_i*mu_hat')'
- gradient: -(1/N)sum_{i = 1}{N}(exp(d_i*theta' + x_i*beta')(d_i - x_i*mu_hat')'*d
Average the resulting theta from the 5 folds
s^2 = (J_0^{-1})E[psi*psi'](J_0^{-1}), where J_0 is the gradient of psi wrt theta
Run this whole process n times and find the median theta value as the final estimate

Questions

In step 2, should I add in higher level polynomial so that the formula that i implement looks like Y ~ poly(X, degree = deg) + poly(D, degree = deg) or just leave it as linear?
In step 2, should I allow for intercepts?
is my matrix multiplication right?
What is best for users - let them pick between either the original approach or concentrating out approach, or only use the original approach for poisson?

** DML cookbook for constructing scores

tcovert commented 4 years ago

hi @yixinsun1216 finally got to looking at this. I think steps 1-3 (and your score computations) in the "implementing in R" above are correct, but I think in step 4, you want to make sure beta is fixed at the value you solved for in step 2, right? beta is part of the nuisance parameter set, and the goal is to estimate that on one part of the splits, and estimate theta based on that value from the other parts, right?

tcovert commented 4 years ago

to answer your questions:

yes, step 2 should look like the poisson formula itself, assuming that whatever interface to glmnet you are using actually supports that
question 2 (intercepts): I think so?
yes, I think your math is right
no idea yet, guess we should find out how different the approaches are in our data.

yixinsun1216 commented 3 years ago

@tcovert I wrote up the rest of the procedures for implementing the two different approaches (finite vs concentrate out) and using lasso vs regression forest. Let me know if you see anything wrong in the implementation steps.

Remaining issues:

When I run the poisson lasso using the finite nuisance parameter method for the DBOEPerAcre estimation, it can only implement with poly_degree =1. Anything higher throws an error. Code below if you want to see how it fails
```
library(dplyr)
library(crossfit)
load("C:/Users/Yixin Sun/Dropbox (Personal)/EPIC/texas/generated_data/final_leases.Rda")
```

reg_data <- final_leases %>% filter(InSample) %>% mutate(Private = NParcels15 > 0 | Type == "RAL")

dboe_formula <- paste("DBOEPerAcre", "Auction", sep = " ~ ") %>% paste("Term + RoyaltyRate + Acres + CentLat + CentLong + EffDate", sep = " | ") %>% as.formula

dboe_data <- filter(reg_data, !Censored)

dboe_p_lasso <- dml(dboe_formula, dboe_data, model = "poisson", n = 11, worker = 4, dml_seed = NULL, ml = "lasso", poly_degree = 3, drop_na = FALSE, score = "finite") print(summary(dboe_p_lasso))



2. Thinking more how you can't get the nuisance parameters using regression forest, I definitely don't think I'm implementing the finite nuisance parameter method correctly with RF. My current process is:
   1. Use lm/glm and run y on D and X -> use this to get s = X*B. 
   2. Use regression forest to find weights in the case of Poisson
   3. Use regression forest of D on X to find m = E[D|X], using weights from step 2. Then plug s and m into the moment and continue as usual

   Maybe you were right that we should keep it simple, and we can just use lasso for the finite nuisance parameter approach and user can choose between lasso or rf for the concentrating out approach. Otherwise not really sure how to implement this - what do you think?

3. For `rlasso`, have you figured out how to force `D` to be included in the lasso
You were probably right that we could just drop `rlasso` now since it only has limited use to us - what do you think?

tcovert commented 3 years ago

Regarding 2: I'm pretty sure there is no kosher way to do FN with RF. The reason is that using the RF regression technology, the nuisance parameter is infinite dimensional, by construction: there is no finite parameter $\beta$ that a random forest spits out after you run it.

On top of this, the FN approach requires a first step calculation of the "beta" parameter in an lm or glm: the part of Y that is explained by X, holding D fixed. This is reasonably well defined in a regression technology that actually gives you a finite parameter as an answer (i.e., LASSO), because you can just give lasso a separate set of columns for D and a separate set for X and whatever interactions you want in it. There's no way to get this separate "additive" structure in RF: in every branch of every tree in an RF, the effects of D and X are always tangled up together.

yixinsun1216 / crossfit