sklearn_ensemble_cv
is a Python module for performing accurate and efficient ensemble cross-validation methods from various projects.
scikit-learn
/sklearn
to provide the most flexibility on various base predictors.from sklearn.tree import DecisionTreeRegressor
from sklearn_ensemble_cv import ECV
# Hyperparameters for the base regressor
grid_regr = {
'max_depth':np.array([6,7], dtype=int),
}
# Hyperparameters for the ensemble
grid_ensemble = {
'max_features':np.array([0.9,1.]),
'max_samples':np.array([0.6,0.7]),
'n_jobs':-1 # use all processors for fitting each ensemble
}
# Build 50 trees and get estimates until 100 trees
res_ecv, info_ecv = ECV(
X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble,
M=50, M_max=100, return_df=True
)
It currently supports bagging- and subagging-type ensembles under square loss.
The hyperparameters of the base predictor are listed at sklearn.tree.DecisionTreeRegressor
and the hyperparameters of the ensemble are listed at sklearn.ensemble.BaggingRegressor
.
Using other sklearn Regressors (regr.is_regressor = True
) as base predictors is also supported.
This project is currently in development. More CV methods will be added shortly.
Check out Jupyter Notebooks in the tutorials folder:
Name | Description |
---|---|
basics.ipynb | Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning. |
cgcv_l1_huber.ipynb | Custom CGCV for M-estimator: l1-regularized Huber ensembles. |
multitask.ipynb | Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning. |
random_forests.ipynb | Apply ECV on model selection of random forests via a simple utility function. |
The code is tested with scikit-learn == 1.3.1
.
The document is available.
The module can be installed via PyPI:
pip install sklearn-ensemble-cv