yixinsun1216 / crossfit

Implementation of Double/Debiased Machine Learning approach
3 stars 2 forks source link

Toggling with honesty and tune.parameter in regression_forest function #1

Closed yixinsun1216 closed 3 years ago

yixinsun1216 commented 4 years ago

Reading through what the grf people have to say about the honesty parameter in small datasets, the trade off we're making is that honesty should lead to less biased estimates, but with small datasets, further splitting the dataset means there might not be enough information for the function to even determine what good splits are in the data.

But switching honesty on or off causes big swings in both size and point estimates of coefficients - can the tune.parameters argument fix these swings and show us the "right" way to do things?

yixinsun1216 commented 4 years ago

Results from toying with texas dataset:

So in the texas case specifically, it seems we should set honesty.fraction to 0.7 and keep tuning.parameters off to increase the speed of the function.

For users in general, the best approach should be if you have a small dataset, to figure out what honesty.fraction is best by running tune_regression_forest, and perhaps keepng tune.parameters turned off if speed is a concern.