jpmml / jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
GNU Affero General Public License v3.0
28 stars 2 forks source link

inference encounter "too many constants" problem #12

Closed StephanieWang closed 3 years ago

StephanieWang commented 3 years ago

I attached the pmml file "lightgbm_down.txt", and I use "transpiler" with version 1.1.8 to convert pmml to evaluator. the code is just as Readme:

File pmmlFile = new File(pmmlFileName);

LoadingModelEvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
    .setLocatable(false)
    .load(pmmlFile);

try {
    Transpiler transpiler = new FileTranspiler("xxx.xxx", new File(pmmlFile.getAbsolutePath() + ".jar"));
    evaluatorBuilder = evaluatorBuilder.transform(new TranspilerTransformer(transpiler));
} catch(IOException ioe){
    log.error("InputStream close error!");
}
Evaluator evaluator = evaluatorBuilder.build();

Thanks for your help.

lightgbm_down.txt

vruusmann commented 3 years ago

lightgbm_down.txt

This file converts and transpiles just fine on my computer:

$ java -jar jpmml-lightgbm-executable-1.3-SNAPSHOT.jar --lgbm-input lightgbm_down.txt --pmml-output lightgbm.pmml
$ java -jar jpmml-transpiler-executable-1.1-SNAPSHOT.jar --pmml-input lightgbm.pmml --jar-output lightgbm.jar
vruusmann commented 3 years ago

log.error("InputStream close error!");

This is not my code, and this is not a correct exception message.

An IOException is typically thrown when the Java compiler fails to compile the generated Java source code. You should print out the full stack trace of this exception (instead of silently dropping it), because it often contains interesting information.

StephanieWang commented 3 years ago

I'm so sorry. I make a mistake. the problem is not encountered when converting to evaluator but when inference.

File pmmlFile = new File(pmmlFileName);

LoadingModelEvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
    .load(pmmlFile);

try {
    Transpiler transpiler = new FileTranspiler("com.mycompany.MyModel", new File(pmmlFile.getAbsolutePath() + ".jar"));

    evaluatorBuilder = evaluatorBuilder.transform(new TranspilerTransformer(transpiler));
} catch(IOException ioe){
    ioe.printStackTrace(System.err);
    //throw ioe;
}

Evaluator evaluator = evaluatorBuilder.build();
Map<FieldName, Object> data = new HashMap<>();
data.put(FieldName.create("origin_price"), 11.0);
data.put(FieldName.create("sell_price"), 4.5);
data.put(FieldName.create("online_score"), 0.002);
data.put(FieldName.create("discount"), 0.5);
// THIS!
Map<FieldName, ?> results = evaluator.evaluate(data);
System.out.println(results);
List<OutputField> outputFields = evaluator.getOutputFields();
System.out.println(outputFields.size());
List<TargetField> targetFields = evaluator.getTargetFields();
TargetField targetField = targetFields.get(0);
FieldName targetFieldName = targetField.getName();

ProbabilityDistribution target = (ProbabilityDistribution) results.get(targetFieldName);
System.out.println(target);

double score = target.getProbability(1);
System.out.println(score);
StephanieWang commented 3 years ago

information

Thanks. I will fix it.

vruusmann commented 3 years ago

the problem is not encountered when converting to evaluator but when inference.

OK, now I remember the broader context of this issue.

It so happens that the Java compiler (java.exe) will happily & quietly generate Java class files that contains invalid (over-size) methods.

This method size problem manifests only when the Java application actually tries to use this invalid (over-size) method (here, when invoking the Evaluator#evaluate(Map) method). It does not manifest itself when the class file is loaded (here, when the Evaluator instance is built using ModelEvaluatorBuilder#build()).

vruusmann commented 3 years ago

Perhaps the JPMML-Transpiler library should run some Java bytecode sanity checks after compilation? For example, visiting all class and method definitions, and checking that they are not over-sized?

vruusmann commented 3 years ago

Alternatively, the PMML to Java translator component should break big decision trees down into smaller pieces, and generate many methods.

It should be possible to figure out optimal break points by visiting the decision tree data structure and counting node elements.

vruusmann commented 3 years ago

Looking at the Java source code of the example LightGBM model, then perhaps the problem is not about the size of member decision tree methods.

What catches my eye is that there the definitions of the DataDictionary element (method #buildDataDictionary$2011342562) and several individual DataField elements can also get quite big.

Or perhaps it's the static initializer of the JavaModel$867988177 class?

vruusmann commented 3 years ago

Or perhaps it's the static initializer of the JavaModel$867988177 class?

Indeed, the size of the static initializer is the problem here.

@StephanieWang You can (at least temporarily-) solve this issue by reducing the number of member decision trees in the ensemble (ie. the n_estimators parameter). The size of the individual member decision trees (ie. the max_depth parameter) is not the limiting factor.

StephanieWang commented 3 years ago

Thanks very much. I will try these solutions. I just tried a small model file which trained with lower n_estimators and lower max_depth, it works well. May be I should split the trees into smaller ones. Thanks so much for your time and help.