jpmml / jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
GNU Affero General Public License v3.0
28 stars 2 forks source link

Warn about large un-transpileable model elements. #9

Open rhorrell opened 4 years ago

rhorrell commented 4 years ago

Hi Villu, I saw you still had some work to do to address the smaller model I sent you. Thank you.

I made an attempt to do the huge PMML (178MB) and I am getting the following error:

$ java  -Xms8g -Xmx8g -jar jpmml-transpiler-executable-1.1-SNAPSHOT.jar --xml-input ~/data/pmml/xgb_pmml.xml  --jar-output ~/data/pmml/x.jar
/PMML$1583159071.java:5473: error: code too large
    private final static MiningModel buildMiningModel$20049680() {
                                     ^
1 error
java.io.IOException
        at org.jpmml.codemodel.CompilerUtil.compile(CompilerUtil.java:81)
        at org.jpmml.codemodel.CompilerUtil.compile(CompilerUtil.java:56)
        at org.jpmml.codemodel.CompilerUtil.compile(CompilerUtil.java:49)
        at org.jpmml.transpiler.TranspilerUtil.compile(TranspilerUtil.java:75)
        at org.jpmml.transpiler.Main.run(Main.java:115)
        at org.jpmml.transpiler.Main.main(Main.java:98)

Any thoughts?

vruusmann commented 4 years ago

Now this is a proper "code too large" compiler error!

I can see that you're attempting to transpile an XGBoost model, and the error happens in relation to a MiningModel element. There are two of those - the top-level one (implementing a modelChain functionality) and inner one(s) (implementing a sum functionality).

Do you have any idea which MiningModel element is causing this error? My guess is it's the inner one(s).

The PMML class model object is easily measurable. The workaround is to split a big method into smaller methods. A good threshold metric is the position of a decision tree model (eg. split method after every 100 decision trees).

vruusmann commented 4 years ago

The method name buildMiningModel indicates that JPMML-Transpiler failed to transpile a MiningModel element into a JavaModel pseudo-element, and is now attempting to generate application code for creating an org.dmg.pmml.mining.MiningModel class model object programmatically.

Something like this:

static
public MiningModel buildMiningModel(){
  return new MiningModel().setMiningSchema(new MiningSchema()).setSegmentation(new Segmentation());
} 

To solve this "code too large" error you must restructure your XGBoost PMML file so that it would become transpile-able.

The important thing to note is that XGBoost models can be represented in PMML in two ways - the default representation (missing value handling uses the Node@defaultChild attribute) and the compact representation.

The JPMML-Transpiler library can only transpile the compact representation right now.

I believe that your XGBoost model is in the default representation. If you re-export or re-code it into the compact representation, then the transpilation should succeed without any "code too large" errors.

vruusmann commented 4 years ago

Example XGBoost model in the compact representation: https://github.com/jpmml/jpmml-transpiler/blob/1.1.0/src/test/resources/pmml/XGBoostAuditNA.pmml

rhorrell commented 4 years ago

Hi Villu, I was able to Transpiler a smaller XGB model but I am getting the following error

$ java -jar pmml-evaluator-example-executable-1.5-SNAPSHOT.jar --model ~/data/pmml/xgb_pmml.jar --input ~/data/input/xgb_testdata.csv --output  /dev/null >> log2.log
Picked up JAVA_TOOL_OPTIONS:  -Xms8g -Xmx8g
Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.evaluator.regression.RegressionModelUtil.normalizeBinaryLogisticClassificationResult(RegressionModelUtil.java:198)
        at org.jpmml.evaluator.regression.RegressionModelUtil.computeBinomialProbabilities(RegressionModelUtil.java:46)
        at PMML$1583159071$JavaModel$695248316.evaluateRegressionTableList$768194342(PMML$1583159071.java)
        at PMML$1583159071$JavaModel$695248316.evaluateClassification(PMML$1583159071.java)
        at org.jpmml.evaluator.java.JavaModelEvaluator.evaluateClassification(JavaModelEvaluator.java:56)
        at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:468)
        at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:540)
        at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:306)
        at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:468)
        at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:239)
        at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:297)
        at org.jpmml.evaluator.example.EvaluationExample.execute(EvaluationExample.java:418)
        at org.jpmml.evaluator.example.Example.execute(Example.java:92)
        at org.jpmml.evaluator.example.EvaluationExample.main(EvaluationExample.java:262)

Is this the other issue you were working on?

vruusmann commented 4 years ago

@rhorrell This exception is definitely separate from the current "code too large" issue. Please open a new issue for each unique exception!

Exception in thread "main" java.lang.IllegalArgumentException org.jpmml.evaluator.regression.RegressionModelUtil.normalizeBinaryLogisticClassificationResult(RegressionModelUtil.java:198)

The transpiled RegressionModel element specifies a link function that does not seem to be permitted according to PMML 4.4 standard: https://github.com/jpmml/jpmml-evaluator/blob/1.5.1/pmml-evaluator/src/main/java/org/jpmml/evaluator/regression/RegressionModelUtil.java#L184-L198

You should be getting the same exception when evaluating this XGBoost model with the JPMML-Evaluator library in the normal "interpreted" mode.

Most likely, your XGBoost model is invalid. What is/was your XGBoost-to-PMML conversion tool?