mmcdermott / MEDS_Tabular_AutoML

Limited automatic tabular ML pipelines for generic MEDS datasets.
MIT License
6 stars 2 forks source link

Change the xgboost sweeper to optimize the binary inclusion of each aggregation x window combination. #52

Open Oufattole opened 2 months ago

Oufattole commented 2 months ago

I have a demo here for how we can use hydra configs to do a binary inclusion of each aggregation x window combination. We should do this instead of doing a categorical distribution over all permutations (which we do now) so the sweeper can learn which individual aggregation x window sizes are best.

See this link for the example

Oufattole commented 1 month ago

I have an example of this in the meds-interp folder: I use this generate-wieghts cli command defined here: https://github.com/Oufattole/meds-interp/blob/dev/src/meds_interp/generate_weights.py And generate it for all the columns the user inputs: https://github.com/Oufattole/meds-interp/blob/dev/tests/test_integration.py#L57C9-L57C69

We can modify this function to instead take in the file path to the code_metadata.parquet to get all codes