jpmml / jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
GNU Affero General Public License v3.0
28 stars 2 forks source link

Transpilation fails for `SimpleSetPredicate` elements #20

Closed denmase closed 1 year ago

denmase commented 1 year ago

Hello @vruusmann,

I was experimenting on the optimal way to implement models (whether using PMML or transpiled-PMML), then I found an error while transpiling few of PMML generated from the sample script on your blog ("Converting Scikit-Learn based LightGBM pipelines to PMML documents"). The error was:

SEVERE: null
java.lang.IllegalArgumentException: org.dmg.pmml.ComplexArray$SetValue
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:519)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:574)
    at org.jpmml.translator.Template.initializeObject(Template.java:135)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:218)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:484)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:542)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:484)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:542)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:484)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:601)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:592)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:562)
    at org.jpmml.translator.Template.initializeObject(Template.java:135)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:218)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:484)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:601)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:592)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:562)
    at org.jpmml.translator.Template.initializeObject(Template.java:135)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:218)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:484)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:542)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:463)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:542)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:444)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:601)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:592)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:536)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:444)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:574)
    at org.jpmml.translator.Template.initializeObject(Template.java:135)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:218)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:463)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:542)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:444)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:601)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:592)
    at org.jpmml.translator.PMMLObjectUtil.addValueConstructorParam(PMMLObjectUtil.java:536)
    at org.jpmml.translator.Template.constructObject(Template.java:123)
    at org.jpmml.translator.PMMLObjectUtil.constructObject(PMMLObjectUtil.java:227)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:216)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:444)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:574)
    at org.jpmml.translator.Template.initializeObject(Template.java:135)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createObject(PMMLObjectUtil.java:218)
    at org.jpmml.translator.PMMLObjectUtil.createBuilderMethod(PMMLObjectUtil.java:168)
    at org.jpmml.translator.PMMLObjectUtil.createExpression(PMMLObjectUtil.java:463)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:601)
    at org.jpmml.translator.PMMLObjectUtil.initializeArray(PMMLObjectUtil.java:592)
    at org.jpmml.translator.PMMLObjectUtil.addSetterMethod(PMMLObjectUtil.java:562)
    at org.jpmml.translator.PMMLTemplate.initializeObject(PMMLTemplate.java:61)
    at org.jpmml.translator.PMMLObjectUtil.initializeObject(PMMLObjectUtil.java:234)
    at org.jpmml.translator.PMMLObjectUtil.createDefaultConstructor(PMMLObjectUtil.java:141)
    at org.jpmml.transpiler.TranspilerUtil.translate(TranspilerUtil.java:68)
    at com.mycompany.mavenproject1.Main.transpilePmml(Main.java:84)
    at com.mycompany.mavenproject1.Main.main(Main.java:56)

Some of the generated PMML were successfully transpiled, however some of them were failed.

I took LightGBMAuditNA.pmml from your test data, and it's ok, however it was PMML 4.3, while the rest were 4.4.

Any hint for this error?

Thanks in advance.

Regards, Agung

vruusmann commented 1 year ago

java.lang.IllegalArgumentException: org.dmg.pmml.ComplexArray$SetValue

The transpiler has found a non-PMML object inside a PMML class model object, and currently does not know how to create a Java initializer code for it.

This exception is about the contents of "set-type" array elements (implementing "value is contained in set" or "value is not contained in set"-type of business logic). The default representation of sets is org.dmg.pmml.Array class. However, for performance reasons, it is customarily pre-parsed into a org.dmg.pmml.ComplexArray subclass.

Some of the generated PMML were successfully transpiled, however some of them were failed.

Looks like "standalone models" are transpiled successfully, while a "model plus set expression combinations" (the model uses an external DerivedField element for encoding a categorical value into a numeric value) fail.

The JPMML-Transpiler testing suite (inside /pmml-transpiler/src/test/resources does not seem to cover the latter scenario.

.. however it was PMML 4.3, while the rest were 4.4

The evaluation and transpilation of PMML documents is schema version agnostic.

What matters is whether the PMML document contains a DerivedField element, which uses any of isIn or isNotIn built-in functions: https://dmg.org/pmml/v4-4-1/BuiltinFunctions.html#boolean5

denmase commented 1 year ago

Hi Villu,

Thanks for swift reply and for looking into it. I can confirm your explanation, as I compared successful vs failed transpilation, the difference is indeed on the existence of SimpleSetPredicate which uses booleanOperator. Something like:

<SimpleSetPredicate field="Occupation" booleanOperator="isIn">
    <Array type="string">Clerical Executive Home Military Professional Protective Sales Support</Array>
</SimpleSetPredicate>
vruusmann commented 1 year ago

the difference is indeed on the existence of SimpleSetPredicate which uses booleanOperator.

The SimpleSetPredicate@booleanOperator="isIn" construct ("predicate") should be fine, because it is covered by tests: https://github.com/jpmml/jpmml-transpiler/blob/1.3.0/pmml-transpiler/src/test/resources/pmml/LightGBMAuditNA.pmml#L222-L224

The problem is expected to happen with Apply@function="isIn" and Apply@function="isNotIn" constructs ("expressions").

denmase commented 1 year ago

Hi Villu,

I don't see any occurrence of Apply in the failed PMML. You are right though, the PMML file you mentioned is indeed transpiled successfully, although it has SimpleSetPredicate@booleanOperator="isIn" in it.

vruusmann commented 1 year ago

I don't see any occurrence of Apply in the failed PMML.

That is strange.

Anyway, the culprit is the Array element, which is typically found inside Apply and SimpleSetPredicate wrapper elements. Since the latter is covered by integration tests, I was assuming it must be the former that is a "marker" to look for in PMML documents.

The issue is probably easier to fix in JPMML-Transpiler code, that to keep triangulating it. The triangulation is currently only needed for building a relevant test case, in order to prevent this issue from re-happening again.

vruusmann commented 1 year ago

This issue was about the SimpleSetPredicate element after all..

All elements that were inside transpileable tree models were handled successfully. However, elements inside un-transpileable tree models, or elements outside of tree models, were failing.

denmase commented 1 year ago

Thank you for the fix, will try it after this.

EDIT: I compiled the git version and updated my test application to use snapshot, but now all those PMML throw this exception.

java.lang.RuntimeException: Uncompilable source code - Erroneous tree type: org.jpmml.transpiler.TranspilerUtil

I'll wait until you make a release. Probably I'm doing it wrongly, pardon my lack of java skill. I hope you keep supporting ID10T (like me). :grinning: