lacava / few

a feature engineering wrapper for sklearn
https://lacava.github.io/few
GNU General Public License v3.0
50 stars 22 forks source link

normalize feature transformations #16

Closed lacava closed 7 years ago

lacava commented 7 years ago

normalize feature transformations automatically before feeding them into the ML fit method. store the transformer so that it can be used in prediction/transformation as well.

lacava commented 7 years ago
Ohjeah commented 7 years ago

I think it would be cleaner to use a Pipeline for this:

from sklearn.linear_model import LassoLarsCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

steps = ("scaler", StandardScaler()), ("estimator", LassoLarsCV())
model = Pipeline(steps)

The api is still the same: model.fit(x_train, y_train), model.predict(x_test) You could even write a Transformer which takes a set of expressions/functions and transforms x to the features.

steps = ("features", MyTransformer(exprs)), ("scaler", StandardScaler()), ("estimator", LassoLarsCV())

Using the model down the line becomes much simpler, e.g. saving it and using it for estimation in a different context, as everything you need it contained in the pipeline object.

lacava commented 7 years ago

that's a good point, we should use the sklearn Pipeline for this, and for our transformations. right now predict() manually transforms then calls predict on the best estimator. it should all be combined into one sklearn Pipeline.

lacava commented 7 years ago

fixed in commit 9124540