tlverse / hal9001

🤠 📿 The Highly Adaptive Lasso
https://tlverse.org/hal9001
GNU General Public License v3.0
49 stars 15 forks source link

`num_knots` default does not grow with sample size? #98

Closed benkeser closed 2 years ago

benkeser commented 2 years ago

Thanks for all the work on this package.

If I am reading the documentation correctly, the new default values of num_knots uses a fixed number of knots for specifying a basis irrespective of sample size. Is that correct? If so, is there a motivation for this?

Larsvanderlaan commented 2 years ago

Not particularly much thought went into the default num_knot choices. But, the main reason I chose to not make it grow with sample size is computation time, so that the package is friendlier for new users. If you have any feedback/ideas on this, that will be greatly appreciated.

benkeser commented 2 years ago

No, that makes sense from a new user perspective and is probably the right approach. Perhaps some rough guidelines could be added to the documentation? Or at least a warning that tuning this value may be beneficial to performance.

The choice does mean "power users" (which may be a group that only includes me) have to be a bit more careful when studying theoretical simulations with HAL, to be sure that we're letting the number of bins get larger. I often use HAL as a catch-all algorithm when I need a consistent nuisance estimator in simulations to verify theoretical properties about estimators. In a simulation study with a student we were getting strange large-sample results and traced the issue (we think) back to this binning choice.

In any case, will close the issue -- just wanted to confirm that indeed the # of bins was staying fixed as sample size increased. Thanks for the response!