yixinsun1216 / crossfit

Implementation of Double/Debiased Machine Learning approach
3 stars 2 forks source link

Lasso using empirical cdf #10

Open yixinsun1216 opened 3 years ago

yixinsun1216 commented 3 years ago

Use the tools in hal9001 to create basis functions for the dataset of interest. Intuitively, we are creating a matrix composed of basis functions based on the covariates. Using the empirical CDFs of the covariates, we create a matrix consists of indicator basis functions (generating dummy variables from the covariate). This results in a large, sparse matrix with binary entries. Many basis functions are created, and we can use the usual lasso methods to select the useful basis functions.

yixinsun1216 commented 3 years ago

Linear HAL Concentrating Out Approach

Note that n = 30 takes 2 hours to run.....

dml(f = bonus_formula, d = reg_data, model = "linear", ml = "hal", 
    n = 30, k = 5, score = "concentrate", workers = 4, drop_na = FALSE, 
    poly_degree = 3)

Coefficients:
        Estimate Std. Error
Auction   0.3967      0.075

Linear HAL with finite nuisance parameter approach

Call:
dml(f = bonus_formula, d = reg_data, model = "linear", ml = "hal", 
    n = 30, k = 5, score = "finite", workers = 4, drop_na = FALSE, 
    poly_degree = 3)

Coefficients:
        Estimate Std. Error
Auction    0.588      0.145

Poisson HAL with concentrating out approach

Call:
dml(f = dboe_formula, d = dboe_data, model = "poisson", ml = "hal", 
    n = 30, score = "concentrate", workers = 4, drop_na = FALSE, 
    poly_degree = 3)

Coefficients:
        Estimate Std. Error
Auction   0.2916      0.222
tcovert commented 3 years ago

thanks for doing this! "poly_degree" in this context means how many interactions hal9001 is doing between the ecdfs? so like poly_degree = 3 means you could have the interaction of the ecdfs of time, lat and lon?

tcovert commented 3 years ago

other thought, related to speed. wonder if the lasso solver that comes with hal9001 is faster than glmnet. I think it only does linear models (?) but maybe worth taking a peek at.