jaydu1 / ensemble-cross-validation

Cross-validation methods designed for ensemble learning
https://jaydu1.github.io/overparameterized-ensembling/
MIT License
1 stars 0 forks source link
cross-validation ensemble-learning

PyPI PyPI-Downloads

Ensemble-cross-validation

sklearn_ensemble_cv is a Python module for performing accurate and efficient ensemble cross-validation methods from various projects.

Features

from sklearn.tree import DecisionTreeRegressor
from sklearn_ensemble_cv import ECV

# Hyperparameters for the base regressor
grid_regr = {    
    'max_depth':np.array([6,7], dtype=int), 
    }
# Hyperparameters for the ensemble
grid_ensemble = {
    'max_features':np.array([0.9,1.]),
    'max_samples':np.array([0.6,0.7]),
    'n_jobs':-1 # use all processors for fitting each ensemble
}

# Build 50 trees and get estimates until 100 trees
res_ecv, info_ecv = ECV(
    X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble, 
    M=50, M_max=100, return_df=True
)

It currently supports bagging- and subagging-type ensembles under square loss. The hyperparameters of the base predictor are listed at sklearn.tree.DecisionTreeRegressor and the hyperparameters of the ensemble are listed at sklearn.ensemble.BaggingRegressor. Using other sklearn Regressors (regr.is_regressor = True) as base predictors is also supported.

Cross-validation methods

This project is currently in development. More CV methods will be added shortly.

Usage

Check out Jupyter Notebooks in the tutorials folder:

Name Description
basics.ipynb Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning.
cgcv_l1_huber.ipynb Custom CGCV for M-estimator: l1-regularized Huber ensembles.
multitask.ipynb Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning.
random_forests.ipynb Apply ECV on model selection of random forests via a simple utility function.

The code is tested with scikit-learn == 1.3.1.

The document is available.

The module can be installed via PyPI:

pip install sklearn-ensemble-cv