Closed infiton closed 6 years ago
If one of the regression trees returns a missing value (i.e. its missing value strategy is nullPrediction) how should the multiple model method handle the missing values.
Aggregation functions cannot be applied to missing values.
If a member model returns a missing value, then the evaluation should be terminated abruptly by propagating this missing value to the top level.
It looks like the jpmml implementation will cast the null to 0
No, it's impossible to cast a missing value to a valid value (such as 0
).
The method SegmentResult#getTargetValue(DataType)
would throw an org.jpmml.evaluator.TypeCheckException
stating that "Expected <DataType>, bot got null". This is a hugely confusing exception message for end users.
verified that the exception is the outcome:
Exception in thread "main" org.jpmml.evaluator.TypeCheckException (at or around line 22): Expected DOUBLE, but got null
at org.jpmml.evaluator.TypeUtil.toDouble(TypeUtil.java:670)
at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:453)
at org.jpmml.evaluator.mining.SegmentResult.getTargetValue(SegmentResult.java:82)
at org.jpmml.evaluator.mining.MiningModelEvaluator.aggregateValues(MiningModelEvaluator.java:639)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:232)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:204)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:185)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:248)
at org.jpmml.evaluator.Example.execute(Example.java:85)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:149)
In the light of the above comment, perhaps segmentation models should use the following pattern:
for(SegmentResult segmentResult : segmentResults){
// If the member model returned a missing value, then propagate it safely to the top level
if(!segmentResult.hasTargetValue()){
return null;
}
Double value = (Double)segmentResult.getTargetValue(DataType.DOUBLE);
// Proceed as usual
}
but, What's the cause of the problem? and what can i do ,when this happened?
Asked the DMG.org to clarify the handling of missing segment scoring results: http://mantis.dmg.org/view.php?id=178
@ronry The exception "Expected DOUBLE, but got null" is typically caused by a missing input value. Have your prepared all your input fields correctly via org.jpmml.evaluator.InputField#prepare(Object)
? If you did, and are still getting this exception, then you should "harden" your model schema. For example, you should define the MiningField@missingValueReplacement
attribute for all input fields that can contain missing values.
sorry, my exception is
Exception in thread "main" org.jpmml.evaluator.TypeCheckException (at or around line 2): Expected org.jpmml.evaluator.HasProbability, but got null
at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:485)
at org.jpmml.evaluator.mining.SegmentResult.getTargetValue(SegmentResult.java:92)
at org.jpmml.evaluator.mining.MiningModelUtil.aggregateProbabilities(MiningModelUtil.java:169)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:302)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:220)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:186)
and my code is
public void process(Map<String,String> context) {
Map<FieldName, Object> inputs = new LinkedHashMap<>();
List<InputField> inputFields = pmmlEvaluator.getActiveFields();
for (InputField inputField : inputFields) {
FieldName inputFieldName = inputField.getName();
final Object rawValue = context.get(inputFieldName.getValue());
inputs.put(inputFieldName, inputField.prepare(rawValue));
}
pmmlEvaluator.evaluate(inputs);
}
Is it the same problem? I has checked inputs
,all of them has value
These two exceptions - Expected DOUBLE, but got null
and Expected org.jpmml.evaluator.HasProbability, but got null
- are the same thing. The former happens with regression-type ensemble models (member predictions are double
values), whereas the latter happens with classification-type ensemble models (member predictions are probability distributions).
@ronry In your code, you should be invoking Evaluator#getInputFields()
, not Evaluator#getActiveFields()
(this is a breaking API change between JPMML-Evaluator 1.2.X and 1.3.X versions). The set of "active fields" is a subset of "input fields". It is possible that this code change fixes the problem for you. Otherwise, you should be working on "hardening" the model schema by defining missing value replacement values for all input fields.
Hello VR,
I face the same issue with my RF model. I have generated the PMML using r2pmml and using the "EvaluationExample" to score the same, but I get an exception that states "Expected double value, got missing value (null)
"
Trace:
Exception in thread "main" org.jpmml.evaluator.TypeCheckException (at or around line 87 of the PMML document): Expected double value, got missing value (null)
at org.jpmml.evaluator.TypeUtil.toDouble(TypeUtil.java:687)
at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:466)
at org.jpmml.evaluator.mining.MiningModelUtil.aggregateValues(MiningModelUtil.java:75)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:271)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:233)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:205)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:298)
at org.jpmml.evaluator.Example.execute(Example.java:86)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:180)
But if I review my input csv file (like you had suggested above) and removed all the "NA's" in my file, the EvaluationExample ran correctly.
Does this exception occur because of missing input values and hence missing values in segments of the model? When you say that we could avoid this error by "hardening schema" -by adding MiningField@missingValueReplacement
attribute for all input fields(in my case NA), can you please give an example of where/how it can be added to the code? (I believe in the EvaluationExample?...could you please confirm)
Thanks in advance.
The DMG has provided clarification about missing value handling (see the above link), and the suggested behaviour should become available in JPMML-Model/JPMML-Evaluator fairly soon.
@nyug In case of R's randomForest
model type, the suggested behaviour would be expressed as Segmentation@missingResultTreatment="returnMissing"
, which means that when one of the member decision trees returns a missing prediction, then the ensemble as a whole should (immediately-) return a missing prediction.
This behaviour would be consistent with R's behaviour - if you invoke predict.randomForest
function with missing data, then you'd be getting missing predictions back as well.
We could avoid this error by "hardening schema" -by adding
MiningField@missingValueReplacement
attribute for all input fields.
Correct. However, you would need to modify the contents of the PMML file, not tweak some JPMML-Evaluator configuration options. If it's a one-time activity, then it can be done in a text editor. If it's a more frequent activity, then it should be done programmatically. Of course, it would be nice if R2PMML/JPMML-R could auto-generate this attribute when appropriate.
@nyug If the above is critical for your use case, then please open a dedicated feature request at one of the R2PMML/JPMML-R projects. There are many ways how such "schema hardening" functionality could be implemented, and it would be nice to have them discussed/documented properly.
Thank you so much for the explanation. Definitely look forward to the DMG suggested behaviors for these model types. In parallel, it would extremely helpful to have this kind of "hardening schema" mechanisms in place that would handle missing values, NA's and prevent these tuples from even being evaluated in the first place (please correct me if my understanding is wrong). Sure, I can open a dedicated feature request in the r2pmml project. Many thanks for your suport.
I got tons of type error, and I can't trace the reason: expected: 'DOUBLE', got: '4.0'. error: org.jpmml.evaluator.InvalidResultException I try to change the input to 4 str(4) float(4) decimal(4), but none of them can pass
expected: 'DOUBLE', got: '4.0'. error: org.jpmml.evaluator.InvalidResultException
@wzxiong The exception type o.j.e.InvalidResultException
is related to the data type of the target field (aka label).
Such type exceptions generally indicate an invalid/badly generated PMML document. Which software did you use to generate your PMML document - must be some non JPMML-family software?
expected: 'DOUBLE', got: '4.0'. error: org.jpmml.evaluator.InvalidResultException
@wzxiong The exception type
o.j.e.InvalidResultException
is related to the data type of the target field (aka label).Such type exceptions generally indicate an invalid/badly generated PMML document. Which software did you use to generate your PMML document - must be some non JPMML-family software?
I found the problem which is hard to solve, in generated pmml file there is a bound called internal "
expected: 'DOUBLE', got: '4.0'. error: org.jpmml.evaluator.InvalidResultException
@wzxiong The exception type
o.j.e.InvalidResultException
is related to the data type of the target field (aka label).Such type exceptions generally indicate an invalid/badly generated PMML document. Which software did you use to generate your PMML document - must be some non JPMML-family software?
internal example \<Interval closure="feature_name" leftMargin="0.0" rightMargin="96.0"/>
However, after I remove all internal, the output prediciton seems to differ from original one.
@wzxiong Would you mind opening a new issue with the JPMML-LightGBM project, and providing a fully reproducible example there? Something where LightGBM and LightGBM-converter-to-PMML are giving different predictions?
Our last comments have no relation to the original issue - your MiningModel
element is working as expected, the problem is somehow related to input field values (outside of the intended applicability domain of the model).
Suppose I have a Segmentation that contains Segments that wrap regression trees. If one of the regression trees returns a missing value (i.e. its missing value strategy is nullPrediction) how should the multiple model method handle the missing values.
It looks like the jpmml implementation will cast the null to 0 (https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/mining/MiningModelEvaluator.java#L639), this makes sense to me, however I can't find where that is outlined in the spec.
Specifically the case of missing values does not seem to be discussed http://dmg.org/pmml/v4-3/MultipleModels.html