jpmml / jpmml-evaluator

Java Evaluator API for PMML
GNU Affero General Public License v3.0
892 stars 255 forks source link

GBDT 2-classification predict with exception #179

Closed Numberartificial closed 4 years ago

Numberartificial commented 4 years ago

PMML:

<?xml version="1.0" encoding="UTF-8"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd"><Header copyright="Copyright (c) 2014, Alibaba Inc." description=""><Application name="ODPS/PMML" version="0.1.0"/><Timestamp>Mon, 13 Jan 2020 03:28:57 GMT</Timestamp></Header><DataDictionary numberOfFields="5"><DataField name="f0" optype="continuous" dataType="double"/><DataField name="f1" optype="continuous" dataType="double"/><DataField name="f2" optype="continuous" dataType="double"/><DataField name="f3" optype="continuous" dataType="double"/><DataField name="label" optype="categorical" dataType="integer"><Value value="0"/><Value value="1"/></DataField></DataDictionary><MiningModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="classification" algorithmName="GBDT"><MiningSchema><MiningField name="f0" usageType="active"/><MiningField name="f1" usageType="active"/><MiningField name="f2" usageType="active"/><MiningField name="f3" usageType="active"/><MiningField name="label" usageType="target"/></MiningSchema><Output><OutputField name="p_0" optype="continuous" dataType="double" feature="probability" value="0"/><OutputField name="p_1" optype="continuous" dataType="double" feature="probability" value="1"/></Output><Segmentation multipleModelMethod="modelChain"><Segment id="0"><True/><MiningModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression" algorithmName="GBDT"><MiningSchema><MiningField name="f0" usageType="active"/><MiningField name="f1" usageType="active"/><MiningField name="f2" usageType="active"/><MiningField name="f3" usageType="active"/></MiningSchema><Output><OutputField name="decisionFunction_y" optype="continuous" dataType="double" feature="predictedValue" isFinalResult="false"/></Output><Segmentation multipleModelMethod="sum"><Segment id="0"><True/><TreeModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression" algorithmName="GBDT" missingValueStrategy="weightedConfidence"><MiningSchema><MiningField name="f0" usageType="active"/><MiningField name="f1" usageType="active"/><MiningField name="f2" usageType="active"/><MiningField name="f3" usageType="active"/></MiningSchema><Node id="0" score="0.6806471785110355"><True/></Node></TreeModel></Segment><Segment id="1"><True/><TreeModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression" algorithmName="GBDT" missingValueStrategy="weightedConfidence"><MiningSchema><MiningField name="f0" usageType="active"/><MiningField name="f1" usageType="active"/><MiningField name="f2" usageType="active"/><MiningField name="f3" usageType="active"/></MiningSchema><Node id="0" score="-0.01224055141210556"><True/></Node></TreeModel></Segment></Segmentation></MiningModel></Segment><Segment id="1"><True/><RegressionModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="classification" algorithmName="GBDT" normalizationMethod="softmax"><MiningSchema><MiningField name="label" usageType="target"/><MiningField name="decisionFunction_y"/></MiningSchema><RegressionTable intercept="0.0" targetCategory="0"><NumericPredictor name="decisionFunction_y" coefficient="-1.0"/></RegressionTable><RegressionTable intercept="0.0" targetCategory="1"/></RegressionModel></Segment></Segmentation></MiningModel></PMML>

JPMML 1.4.14 load verify done, then predict with exception:

org.jpmml.evaluator.TypeCheckException: Expected org.jpmml.evaluator.HasProbability value, got org.jpmml.evaluator.Classification value

  at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:538)
  at org.jpmml.evaluator.OutputUtil.getProbability(OutputUtil.java:481)
  at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:210)
  at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:702)
  at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:207)
  at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:580)

I solve this prediction by downgrade JPMML version to 1.3.2, so i think this might be version upgrade mismatch. I hope i can still use newest release version.

vruusmann commented 4 years ago

JPMML 1.4.14 load verify done

Just a comment - the verify isn't done, because this GBDT model is not associated with any verification data. If the verification data was there, then Evaluator#verify() would fail with this TypeCheckException eagerly (instead of it being delayed).

I believe this issue is a duplicate of https://groups.google.com/forum/#!topic/jpmml/GyWRf6pWpgw

vruusmann commented 4 years ago

I believe this issue is a duplicate of https://groups.google.com/forum/#!topic/jpmml/GyWRf6pWpgw

TLDR: The Output element should be attached to the child MiningModel/Segmentation/Segment[@id=1]/RegressionModel element (not the top-level MiningModel element).

Numberartificial commented 4 years ago

Thanks for reply. Do you mean like this: This issue is made by wrong PMML 4.2 format file which 1.3.2 can load with a wrong station, and 1.4.14 fix it. I should reformat this PMML 4.2 file, then 1.4.14 is able to load and predict it.

vruusmann commented 4 years ago

This issue is made by wrong PMML 4.2 format file which 1.3.2 can load with a wrong station, and 1.4.14 fix it.

I'm not saying that this PMML example is wrong. It just follows some odd conventions. JPMML-Evaluator 1.3.2 version didn't check this particular structural requirement, but the 1.4.14 version does, and it doesn't like it.

It's like the concept of "minimizing the scope of local variables" in programming - the Output element should be declared in a place where it is relevant, not in some other (effectively random-) place.

I should reformat this PMML 4.2 file, then 1.4.14 is able to load and predict it.

If you reformat this file, then it will work with JPMML-Evaluator 1.4.14.

However, I'm likely to teach the JPMML-Evaluator about this odd convention, so perhaps already the 1.4.15 version will be able to deal with this file.

Numberartificial commented 4 years ago

Thanks! Hope 1.4.15 will deal with this [not very pretty but also not wrong] file cause this file is generated from a public use machine learning platform. I solve this follow your tips:

<?xml version="1.0" encoding="UTF-8"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
    <Header copyright="Copyright (c) 2014, Alibaba Inc." description="">
        <Application name="ODPS/PMML" version="0.1.0"/>
        <Timestamp>Mon, 13 Jan 2020 03:28:57 GMT</Timestamp>
    </Header>
    <DataDictionary numberOfFields="5">
        <DataField name="f0" optype="continuous" dataType="double"/>
        <DataField name="f1" optype="continuous" dataType="double"/>
        <DataField name="f2" optype="continuous" dataType="double"/>
        <DataField name="f3" optype="continuous" dataType="double"/>
        <DataField name="label" optype="categorical" dataType="integer">
            <Value value="0"/>
            <Value value="1"/>
        </DataField>
    </DataDictionary>
    <MiningModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="classification" algorithmName="GBDT">
        <MiningSchema>
            <MiningField name="f0" usageType="active"/>
            <MiningField name="f1" usageType="active"/>
            <MiningField name="f2" usageType="active"/>
            <MiningField name="f3" usageType="active"/>
            <MiningField name="label" usageType="target"/>
        </MiningSchema>
        <Segmentation multipleModelMethod="modelChain">
            <Segment id="0">
                <True/>
                <MiningModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression" algorithmName="GBDT">
                    <MiningSchema>
                        <MiningField name="f0" usageType="active"/>
                        <MiningField name="f1" usageType="active"/>
                        <MiningField name="f2" usageType="active"/>
                        <MiningField name="f3" usageType="active"/>
                    </MiningSchema>
                    <Output>
                        <OutputField name="decisionFunction_y" optype="continuous" dataType="double"
                                     feature="predictedValue" isFinalResult="false"/>
                    </Output>
                    <Segmentation multipleModelMethod="sum">
                        <Segment id="0">
                            <True/>
                            <TreeModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression"
                                       algorithmName="GBDT" missingValueStrategy="weightedConfidence">
                                <MiningSchema>
                                    <MiningField name="f0" usageType="active"/>
                                    <MiningField name="f1" usageType="active"/>
                                    <MiningField name="f2" usageType="active"/>
                                    <MiningField name="f3" usageType="active"/>
                                </MiningSchema>
                                <Node id="0" score="0.6806471785110355">
                                    <True/>
                                </Node>
                            </TreeModel>
                        </Segment>
                        <Segment id="1">
                            <True/>
                            <TreeModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="regression"
                                       algorithmName="GBDT" missingValueStrategy="weightedConfidence">
                                <MiningSchema>
                                    <MiningField name="f0" usageType="active"/>
                                    <MiningField name="f1" usageType="active"/>
                                    <MiningField name="f2" usageType="active"/>
                                    <MiningField name="f3" usageType="active"/>
                                </MiningSchema>
                                <Node id="0" score="-0.01224055141210556">
                                    <True/>
                                </Node>
                            </TreeModel>
                        </Segment>
                    </Segmentation>
                </MiningModel>
            </Segment>
            <Segment id="1">
                <True/>
                <RegressionModel modelName="xlab_m_GBDT_LR_1_1773039_v0" functionName="classification"
                                 algorithmName="GBDT" normalizationMethod="softmax">
                    <MiningSchema>
                        <MiningField name="label" usageType="target"/>
                        <MiningField name="decisionFunction_y"/>
                    </MiningSchema>

                    <Output>
                        <OutputField name="p_0" optype="continuous" dataType="double" feature="probability" value="0"/>
                        <OutputField name="p_1" optype="continuous" dataType="double" feature="probability" value="1"/>
                    </Output>
                    <RegressionTable intercept="0.0" targetCategory="0">
                        <NumericPredictor name="decisionFunction_y" coefficient="-1.0"/>
                    </RegressionTable>
                    <RegressionTable intercept="0.0" targetCategory="1"/>
                </RegressionModel>
            </Segment>
        </Segmentation>
    </MiningModel>
</PMML>

With best regards.

vruusmann commented 4 years ago

I solve this follow your tips:

That's the correct edit - simply move the Output element from the top-level MiningModel element to the last child RegressionModel element.

.. generated from a public use machine learning platform.

I'm not familiar with Alibaba's ODPS software. Is there a public demo somewhere (hopefully with some english documentation) that I could play with?

Hope 1.4.15 will deal with this

I'm expecting to work on this issue later this week. There are some related issues open (regarding the scoping of fields and target values in the MiningModel element) that can be handled all together.

Numberartificial commented 4 years ago

^-^ I found it Alibaba's Machine Learning platform GBDT#binary classification english document. This PMML is generated with PAI studio PMML model export button with pipeline according to the above document.