Open yixinsun1216 opened 3 years ago
Note that n = 30 takes 2 hours to run.....
dml(f = bonus_formula, d = reg_data, model = "linear", ml = "hal",
n = 30, k = 5, score = "concentrate", workers = 4, drop_na = FALSE,
poly_degree = 3)
Coefficients:
Estimate Std. Error
Auction 0.3967 0.075
Call:
dml(f = bonus_formula, d = reg_data, model = "linear", ml = "hal",
n = 30, k = 5, score = "finite", workers = 4, drop_na = FALSE,
poly_degree = 3)
Coefficients:
Estimate Std. Error
Auction 0.588 0.145
Call:
dml(f = dboe_formula, d = dboe_data, model = "poisson", ml = "hal",
n = 30, score = "concentrate", workers = 4, drop_na = FALSE,
poly_degree = 3)
Coefficients:
Estimate Std. Error
Auction 0.2916 0.222
thanks for doing this! "poly_degree" in this context means how many interactions hal9001 is doing between the ecdfs? so like poly_degree = 3 means you could have the interaction of the ecdfs of time, lat and lon?
other thought, related to speed. wonder if the lasso solver that comes with hal9001 is faster than glmnet. I think it only does linear models (?) but maybe worth taking a peek at.
Use the tools in hal9001 to create basis functions for the dataset of interest. Intuitively, we are creating a matrix composed of basis functions based on the covariates. Using the empirical CDFs of the covariates, we create a matrix consists of indicator basis functions (generating dummy variables from the covariate). This results in a large, sparse matrix with binary entries. Many basis functions are created, and we can use the usual lasso methods to select the useful basis functions.