jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Ability to suppress the (default-) `Output` element #180

Open liuhuanshuo opened 1 year ago

liuhuanshuo commented 1 year ago

Using sklearn2pmml converted pmml file, the default output is [y,probability(1),probability(0)].

Is there a way to change the default column name, such as changing probability(1) to proba

Or can I select the column that I want, for example I only need to print y columns, I don't need to default to output probability(1),probability(0)

vruusmann commented 1 year ago

Using sklearn2pmml converted pmml file, the default output is [y,probability(1),probability(0)].

These three values are calculated all in one pass. Therefore, there will be no "performance benefit" to getting rid of the probability output fields other than "visual effect" (eg. keeping things extremely focussed on the screen).

In Scikit-Learn, it would take two passes (first predict(X), then predict_proba(X)) to create such a results data matrix.

Is there a way to change the default column name, such as changing probability(1) to proba

Column renaming is covered in these recently opened issues: https://github.com/jpmml/sklearn2pmml/issues/359 and https://github.com/jpmml/sklearn2pmml/issues/361

There is a special API for renaming transformer fields, but not for renaming model fields.

Or can I select the column that I want, for example I only need to print y columns

yt = evaluator.evaluateAll(X)

# THIS!
yt = yt["y"]

You may consider wrapping the Evaluator.evaluate(X) function call into a separate helper function, which adds/removes result columns as you wish.

vruusmann commented 1 year ago

I can see the benefit of adding a special-purpose API for disabling the generation of default Output elements.

The easiest way would be such, where the end users signals his/her intent by setting a pmml_output = False attribute on the (fitted-) model object:

classifier = ...

pipeline = PMMLPipeline([
  ("classifier", classifier)
])
pipeline.fit(X, y)

# Default config - the Output element is created
sklearn2pmml(pipeline, "classifier.pmml")

classifier.pmml_output = False

# Custom config - the Output element is not created
sklearn2pmml(pipeline, "classifier-no_proba.pmml")