mmcdermott / MEDS_Tabular_AutoML

Limited automatic tabular ML pipelines for generic MEDS datasets.
MIT License
10 stars 2 forks source link

Related to #37, we should support more extensive search options over window sizes, codes, and aggregation functions #39

Open mmcdermott opened 3 months ago

mmcdermott commented 3 months ago

Basically, at the end of the day we pass into XGBoost a list of features that are allowed. These are normally cross-products across window sizes specified via a list from a set of options sampled by Optuna, a set of aggregations from options likewise sampled by Optuna, and a set of codes determined via manual entry and frequency based constraints. We should expand the scope of the searchable and expressable space to include additional kinds of options and search paradigms, including

This would necessitate changes both in the Optuna distributional space and on the data loading side, so would be an involved effort, but would result in a system that would simultaneously identify the most critical features, thereby potentially aiding in interpretability, and have much more flexibility than our current systems do.

mmcdermott commented 3 months ago

First step of this broader effort: