jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Adding node id to output field #182

Closed axemixer closed 1 year ago

axemixer commented 1 year ago

Hello ,

I have a basic tree model represented as pmml file. I can take score in pmml output. Is there any chance that I can also add node id's into new output ?

Basic tree model:

<Node id="1" score="min&lt;=x&lt;2">
     <SimplePredicate field="x" operator="lessThan" value="2.0"/>
<Node id="2" score="min&lt;=y&lt;3">
     <SimplePredicate field="y" operator="lessThan" value="3.0"/>

Output section of tree model :

<Output>
    <OutputField dataType="string" feature="entityID" name="tree_strategy_output" optype="categorical"/>            
</Output>
vruusmann commented 1 year ago

You can tweak the PMML representation of estimator objects using conversion options.

They can be set manually by defining a pmml_options_ attribute, which is a dict of key-value pairs. Alternatively, they can be set semi-automatically for the final estimator step using the PMMLPipeline.configure(**pmml_options) method.

Is there any chance that I can also add node id's into new output

First, you can toggle node identifiers on and off by setting the node_id conversion option:

classifier = DecisionTreeClassifier()

# THIS!
classifier.pmml_options = {"node_id" : True}

Node identifiers are "off" by default, because they take up extra space, and would be lost anyway when performing tree reorganizations (compaction and flattening).

Second, you can enable the generation of the Output@feature="entityId" element by setting the winner_id conversion option:

pipeline = PMMLPipeline([
    ("classifier", classifier)
])
pipeline.fit(X, y)

# THIS!
pipeline.configure(node_id = True, winner_id = True)