jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Support for transformer-only pipelines #177

Closed peiji1981 closed 2 years ago

peiji1981 commented 2 years ago

hi, how can i use this tool to transform only preprocessing operator like onehot, labelencoder, without any classifer or regressor or cluster?

vruusmann commented 2 years ago

i use this tool to transform only preprocessing operator like onehot, labelencoder, without any classifer or regressor or cluster

In other words, you're interested in converting a "transformer-only pipeline".

It is supported by the JPMML-SparkML library so, in principle, it should be doable here as well.

preprocessing operator like onehot, labelencoder

These two transformers are temporary/in-memory type (transform from string to intermediate numeric representation). They do not have a persistent PMML representation (because PMML estimators will operate on original string values directly).

vruusmann commented 2 years ago

Well, looks like transformer-only pipelines are fully supported already:

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline

iris_X, iris_y = load_iris(return_X_y = True, as_frame = True)

pipeline = PMMLPipeline([
    ("scaler", StandardScaler()),
])
pipeline.fit(iris_X, iris_y)

sklearn2pmml(pipeline, "StandardScalerIris.pmml")

Therefore, closing this issue as invalid.