Closed StephanieWang closed 3 years ago
lightgbm_down.txt
This file converts and transpiles just fine on my computer:
$ java -jar jpmml-lightgbm-executable-1.3-SNAPSHOT.jar --lgbm-input lightgbm_down.txt --pmml-output lightgbm.pmml
$ java -jar jpmml-transpiler-executable-1.1-SNAPSHOT.jar --pmml-input lightgbm.pmml --jar-output lightgbm.jar
log.error("InputStream close error!");
This is not my code, and this is not a correct exception message.
An IOException
is typically thrown when the Java compiler fails to compile the generated Java source code. You should print out the full stack trace of this exception (instead of silently dropping it), because it often contains interesting information.
I'm so sorry. I make a mistake. the problem is not encountered when converting to evaluator but when inference.
File pmmlFile = new File(pmmlFileName);
LoadingModelEvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
.load(pmmlFile);
try {
Transpiler transpiler = new FileTranspiler("com.mycompany.MyModel", new File(pmmlFile.getAbsolutePath() + ".jar"));
evaluatorBuilder = evaluatorBuilder.transform(new TranspilerTransformer(transpiler));
} catch(IOException ioe){
ioe.printStackTrace(System.err);
//throw ioe;
}
Evaluator evaluator = evaluatorBuilder.build();
Map<FieldName, Object> data = new HashMap<>();
data.put(FieldName.create("origin_price"), 11.0);
data.put(FieldName.create("sell_price"), 4.5);
data.put(FieldName.create("online_score"), 0.002);
data.put(FieldName.create("discount"), 0.5);
// THIS!
Map<FieldName, ?> results = evaluator.evaluate(data);
System.out.println(results);
List<OutputField> outputFields = evaluator.getOutputFields();
System.out.println(outputFields.size());
List<TargetField> targetFields = evaluator.getTargetFields();
TargetField targetField = targetFields.get(0);
FieldName targetFieldName = targetField.getName();
ProbabilityDistribution target = (ProbabilityDistribution) results.get(targetFieldName);
System.out.println(target);
double score = target.getProbability(1);
System.out.println(score);
information
Thanks. I will fix it.
the problem is not encountered when converting to evaluator but when inference.
OK, now I remember the broader context of this issue.
It so happens that the Java compiler (java.exe
) will happily & quietly generate Java class files that contains invalid (over-size) methods.
This method size problem manifests only when the Java application actually tries to use this invalid (over-size) method (here, when invoking the Evaluator#evaluate(Map)
method). It does not manifest itself when the class file is loaded (here, when the Evaluator
instance is built using ModelEvaluatorBuilder#build()
).
Perhaps the JPMML-Transpiler library should run some Java bytecode sanity checks after compilation? For example, visiting all class and method definitions, and checking that they are not over-sized?
Alternatively, the PMML to Java translator component should break big decision trees down into smaller pieces, and generate many methods.
It should be possible to figure out optimal break points by visiting the decision tree data structure and counting node elements.
Looking at the Java source code of the example LightGBM model, then perhaps the problem is not about the size of member decision tree methods.
What catches my eye is that there the definitions of the DataDictionary
element (method #buildDataDictionary$2011342562
) and several individual DataField
elements can also get quite big.
Or perhaps it's the static initializer of the JavaModel$867988177
class?
Or perhaps it's the static initializer of the JavaModel$867988177 class?
Indeed, the size of the static initializer is the problem here.
@StephanieWang You can (at least temporarily-) solve this issue by reducing the number of member decision trees in the ensemble (ie. the n_estimators
parameter). The size of the individual member decision trees (ie. the max_depth
parameter) is not the limiting factor.
Thanks very much. I will try these solutions. I just tried a small model file which trained with lower n_estimators and lower max_depth, it works well. May be I should split the trees into smaller ones. Thanks so much for your time and help.
I attached the pmml file "lightgbm_down.txt", and I use "transpiler" with version 1.1.8 to convert pmml to evaluator. the code is just as Readme:
Thanks for your help.
lightgbm_down.txt