jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Using a result plused by two numbers as a new feature when generating PMML file? #149

Closed HelloLadsAndGents closed 3 years ago

HelloLadsAndGents commented 3 years ago

from sklearn_pandas import DataFrameMapper from sklearn.preprocessing import StandardScaler from sklearn2pmml.decoration import ContinuousDomain

column_preprocessor = DataFrameMapper([ (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()]) ])

for example How can i use "Sepal.Length" add "Sepal.Width", or "Sepal.Width" plus "Petal.Width" as a new feature and then predict?

i konw that self-defined func is kind of complex , is there any other ways or some usable function for this kind of situation?

thanks

vruusmann commented 3 years ago

How can i use "Sepal.Length" add "Sepal.Width", or "Sepal.Width" plus "Petal.Width" as a new feature and then predict?

Use the sklearn2pmml.preprocessing.ExpressionTransformer transformation type:

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()]),
    (["Sepal.Length", "Sepal.Width"], [ExpressionTransformer("X[0] + X[1]"), StandardScaler()])
])

If summing is all you need, then you can also use the sklearn2pmml.preprocessing.Aggregator transformation type:

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()]),
    (["Petal.Length", "Petal.Width"], [Aggregator(function = "sum"), StandardScaler()])
])
HelloLadsAndGents commented 3 years ago

thanks a lot ^_^