jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Support for boolean features #29

Closed vruusmann closed 3 years ago

vruusmann commented 4 years ago

Encountered the following exception, when training a binary classifier with a sparse dataset that contains a boolean column ("Audit/Deductions"):

java.lang.IllegalArgumentException
        at org.jpmml.converter.PredicateManager.createArray(PredicateManager.java:73)
        at org.jpmml.converter.PredicateManager.createSimpleSetPredicate(PredicateManager.java:44)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:217)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)

The boolean value space contains exactly two scalar values. It's weird that the values.size() == 1, didn't fire in PredicateManager#createSimpleSetPredicate(...), which suggests that the values variable is either an empty collection, or a two-valued one.

vruusmann commented 4 years ago

The above happens with pandas.Series boolean columns that contain missing values - in that case there are three category levels in action - False, True and float("NaN") denoting missing values.

The converter could detect such "one extra category"-situations and see if there's a float("NaN") category level involved. If it is, then it should be "demoted" to a normal missing value.