jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

ExtraTreesRegressor does not seem to have target variables for the segments declared #1

Closed camerondavison closed 8 years ago

camerondavison commented 8 years ago

getting

ERROR [2015-11-19 22:23:55,309] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 5b4145018068a76d
! org.jpmml.evaluator.TypeCheckException: Expected DOUBLE, but got null
! at org.jpmml.evaluator.TypeUtil.toDouble(TypeUtil.java:571) ~[pmml-evaluator-1.2.6.jar:na]
! at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:378) ~[pmml-evaluator-1.2.6.jar:na]
! at org.jpmml.evaluator.TypeUtil.parseOrCast(TypeUtil.java:66) ~[pmml-evaluator-1.2.6.jar:na]
! at org.jpmml.evaluator.MiningModelEvaluator.aggregateValues(MiningModelEvaluator.java:458) ~[pmml-evaluator-1.2.6.jar:na]

Since aggregateValues is getting back a null result from

for(SegmentResultMap segmentResult : segmentResults){
            Object targetValue = EvaluatorUtil.decode(segmentResult.getTargetValue());

which to me seems to imply that the trees are not returning target values.

It looks like some of the recent refactoring in https://github.com/jpmml/jpmml-sklearn/commit/27858a1e9794c8bbc976047749dc85281057c112#diff-d4ca34d7102c57121516753b9faf5e41 where the standalone variable was used to set the target field to something only when true, but https://github.com/jpmml/jpmml-sklearn/commit/27858a1e9794c8bbc976047749dc85281057c112#diff-b6e00c7675e0a9b5c3c0432ddf12c47eL126 was always setting the target field no matter what the standalone variable said. May have something to do with it? Really just from my sort of glancing through the code.

camerondavison commented 8 years ago

Hmm according to http://dmg.org/pmml/v4-2-1/TreeModel.html I thought that the PMML document would need to have a target field in it, but looks like the jpmml-evaluator works without it. I found another bug in my code that accounted for the exception above.

vruusmann commented 8 years ago

According to the PMML specification, "The definition of target fields is not required since they do not have an impact on scoring results. For supervised models, however, the definition of target fields is often useful for documentation purposes".

Currently, when dealing with segmentation models (eg. Bagging, ExtraTrees, RandomForest) then the top-level MiningModel element defines a target field, whereas the member TreeModel elements don't. One way to look at things is that the scores of member TreeModel elements exist only in "local scope", so there's no point in naming them.

I'm working on extending the JPMML-SkLearn library so that it would be possible to assign names to all target fields, regardless of their position in the hierarchy. For example, this is needed for building ensembles of ensemble models (eg. the VotingClassifier model type).

vruusmann commented 8 years ago

Also, let me guess - you got a null result, because you were accidentally feeding a null active field value to the evaluator? The default missingValueHandlingStrategy of TreeModel element is none, which triggers the default noTrueChildStrategy of returnNullPrediction.

Will have to check what is the Scikit-Learn's policy here. Probably, the evaluation should fail with an exception instead.

camerondavison commented 8 years ago

Thanks for the descriptive response. Yes. I was passing a null value for a LABEL field and then it was triggering the noTrue and then nullPrediction