experiment:
type: regression # --> pycaret.regression.RegressionExperiment
setup:
# data: None # All of the above
target: "first_leaves_doy"
train_size: 0.75
preprocess: false
normalize: true
normalize_method: zscore # i.e. default
fold_strategy: kfold # i.e. default
fold: 10
fold_shuffle: true
session_id: 123 # control randomness for reproducibility
compare_models:
include:
- 'lr' # linear regression
- 'rf' # random forest regressor
- 'xgboost' # Extreme gradient boosting (bonus)
- merf.MERF # Must be instantiated before passing to pycaret; how to specify args?
- interpret.glassbox.ExplainableBoostingRegressor # Must be instantiated before passing to pycaret
fit_kwargs:
merf.MERF: # This won't work out of the box
fixed_effects: None # "the rest" after removing cluster, random effects, and target columns
random_effects: ['tmax_365','tmin_365', 'prcp_365', 'srad_365', 'swe_365']
clusters: "site_id"
cross_validation: False # evaluate metrics on holdout set for now
A few notes:
With MERF you can specify the fixed_effects_model, so we might be able to use EBM that way?
With MERF, the syntax for fit is slightly different than with sklearn, it requires fixed/random/cluster columns
Looks like MERF and EBM could (relatively) easily be integrated with pycaret, but need to find a way to instantiate the models before passing them.
Extracted workflow from https://github.com/phenology/springtime/blob/91ba9abb5faf9fcc763fb37385fe96007d104426/docs/notebooks/mk_modelling_npn.ipynb very naively ported to a recipe-like format:
Alternatively, we could consider using pycaret:
A few notes: