ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
96 stars 13 forks source link
feature-engineering ibis machine-learning sql

IbisML

Build status Docs License PyPI

What is IbisML?

IbisML is a library for building scalable ML pipelines using Ibis:

How do I install IbisML?

pip install ibis-ml

How do I use IbisML?

With recipes, you can define sequences of feature engineering steps to get your data ready for modeling. For example, create a recipe to replace missing values using the mean of each numeric column and then normalize numeric data to have a standard deviation of one and a mean of zero.

import ibis_ml as ml

imputer = ml.ImputeMean(ml.numeric())
scaler = ml.ScaleStandard(ml.numeric())
rec = ml.Recipe(imputer, scaler)

A recipe can be chained in a Pipeline like any other transformer.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

pipe = Pipeline([("rec", rec), ("svc", SVC())])

The pipeline can be used as any other estimator and avoids leaking the test set into the train set.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
pipe.fit(X_train, y_train).score(X_test, y_test)