jpmml / jpmml-evaluator

Java Evaluator API for PMML
GNU Affero General Public License v3.0
886 stars 256 forks source link

Introduce `org.jpmml.evaluator.Transformer` entry point #96

Open christophe-rannou opened 6 years ago

christophe-rannou commented 6 years ago

Hi,

I have a question which I am not really sure is PMML related or jpmml-evaluator related. I would like to serialize my preprocessing tasks through PMML. So far I succeded in parsing my preprocessing tasks using the TransformationDictionnary and to output the desired processed field through Output when coupled with a MiningModel (such as a tree). Is it possible to have like an Identity MiningModel to return those field without needing a dummy model to evaluate the PMML ?

The following PMML sums up what I am trying to achieve (not valid since missing a functionName):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3" x-baseVersion="4.3">
    <Header>
        <Application name="MyApp"/>
    </Header>
    <DataDictionary>
        <DataField name="feat" optype="categorical" dataType="string">
            <Value value="A"/>
            <Value value="B"/>
            <Value value="C"/>
        </DataField>
    </DataDictionary>
    <TransformationDictionary>
        <DerivedField name="encoded_feat" optype="continuous" dataType="integer">
            <MapValues outputColumn="output">
                <FieldColumnPair field="feat" column="input"/>
                <InlineTable>
                    <row>
                <input>A</input>
                <output>0</output>
            </row>
                    <row>
                <input>B</input>
                <output>1</output>
            </row>
                    <row>
                <input>C</input>
                <output>2</output>
            </row>
                </InlineTable>
            </MapValues>
        </DerivedField>
    </TransformationDictionary>
    <MiningModel>
        <MiningSchema>
            <MiningField name="feat"/>
        </MiningSchema>
        <Output>
            <OutputField name="final" optype="continuous" dataType="integer" feature="transformedValue">
                <FieldRef field="encoded_feat"/>
            </OutputField>
        </Output>
    </MiningModel>
</PMML>

Thanks

vruusmann commented 6 years ago

I would like to serialize my preprocessing tasks through PMML.

The "entry point" of the JPMML-Evaluator library is the org.jpmml.evaluator.Evaluator interface, which requires a backing model element. If your PMML document does not contain any model elements, then you cannot use this "entry point" and either 1) must devise and develop an alternative "entry point" interface (something like org.jpmml.evaluator.Preprocessor?) or 2) use an alternative library.

Is it possible to have like an Identity MiningModel to return those field without needing a dummy model to evaluate the PMML?

You cannot use the MiningModel element for that, because it is a wrapper around child model elements.

However, you can use the RegressionModel element to represent regression-type identity transforms. Just construct the following regression table: y = 1.0 * encoded_feat + 0.0

vruusmann commented 6 years ago

Reopening this issue, because I might want to do something about it in the upcoming 1.4.X development branch.

christophe-rannou commented 6 years ago

I would like to work on this, is there any pointers you could give me ?

vruusmann commented 6 years ago

You raised this issue, so you have a use case that needs addressing, not me.

The goal is to design an interface similar to org.jpmml.evaluator.Evaluator, but for preprocessors. The Evaluator interface encapsulates models, so it's dealing with model input, target and result fields; every field class has its own specification, etc.

The requirement here is to design a "preprocessor schema". It should have at least two schema query methods Preprocessor#getArgumentFields() and Preprocessor#getResultFields(), and the evaluate method Preprocessor#evaluate(Map<FieldName, Object>). Anyway, preprocessor's argument fields and result fields are functionally different from standard model fields.

jqueguiner commented 6 years ago

@christophe-rannou : pushing code ? ;-)

vruusmann commented 5 years ago

Opened a "request for clarification" at DMG.org's issue tracker to have the entry/exit interfaces of transformer-only PMML documents specified: http://mantis.dmg.org/view.php?id=228

ZhejunWu commented 3 years ago

Hi, I'd like to ask is this issue resolved? I saw this related PR: https://github.com/jpmml/jpmml-evaluator/pull/116 was closed instead of merged. I wonder if we have any workaround to use transformer-only pmml. Could you please advise? Thanks!

vruusmann commented 3 years ago

I wonder if we have any workaround to use transformer-only pmml.

The PMML specification does not define such a workflow.

I've asked DMG.org to clarify the situation, but there's been no official response yet (typically takes 2-3 years to obtain it): http://mantis.dmg.org/view.php?id=228

The JPMML software project can always do a vendor extension. But I don't have a clear use case to base my work upon.

I saw this related PR: #116 was closed instead of merged.

We don't do copy&paste programming in this project.