jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable #52

Closed ruzbeh closed 2 years ago

ruzbeh commented 2 years ago

Hey,

Can someone help with this


Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:210)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
    at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:96)
    at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:72)
    at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:47)
    at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:394)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:383)
    at org.jpmml.lightgbm.Main.run(Main.java:132)
    at org.jpmml.lightgbm.Main.main(Main.java:118)

getting this error abruptly

lightGBM version: 3.1.0 jpmml version: 1.2

on updating jpmml to latest version i.e: 1.3.11

getting this error


Sep 20, 2021 11:37:02 PM org.jpmml.lightgbm.Main run
INFO: Loading GBDT..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Loaded GBDT in 485 ms.
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Converting GBDT to PMML..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
SEVERE: Failed to convert GBDT to PMML
java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
    at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
    at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
    at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
    at org.jpmml.lightgbm.Main.run(Main.java:158)
    at org.jpmml.lightgbm.Main.main(Main.java:127)

Exception in thread "main" java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
    at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
    at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
    at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
    at org.jpmml.lightgbm.Main.run(Main.java:158)
    at org.jpmml.lightgbm.Main.main(Main.java:127)
ruzbeh commented 2 years ago

do let me know if anything else is needed that can help us solve this problem @vruusmann

vruusmann commented 2 years ago

Exception in thread "main" java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])

Didn't you read this exception message - you have a data column, which contains a mix of string and integer values.

If you standardize to a common data type (preferably string), does the first exception persist or not?

vruusmann commented 2 years ago

Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable

As for the first exception, then the JPMML-LightGBM library performs smart "category levels pruning". Basically it keeps track which category values are sent to left and right branches, and uses this information to generate more compact SimpleSetPredicate elements.

The expectation is that neither left nor right can not be(come) empty. This expectation is violated in the current case - for some reason all values are sent to the left branch, so that the right branch remains empty.

ruzbeh commented 2 years ago

Thanks for the quick reply @vruusmann

just to add both these issues are on the same model file. different library versions gave a different error the first stack trace is for jpmml- 1.2 the second stack trace is for jpmml- 1.3.11

will cross-check data for mixed data types and will try to standardized if we find this mismatch. Also just to understand wouldn't datatype mismatch error come in both the version of the library

and by the second comment, it seems like lightGBM is creating a really unbalanced is it? is there any parameter to balance lightgbm

vruusmann commented 2 years ago

just to add both these issues are on the same model file. different library versions gave a different error

It's likely that the second error is a precursor to the first error.

The newer library version catches data inconsistency earlier, and doesn't get to the tree traversal/pruning stage at all. The older library doesn't do such sanity checking.

will cross-check data for mixed data types and will try to standardized if we find this mismatch.

Take a look into the LightGBM text model file, and scroll to the pandas_categorical section. One (or more) of your columns contain category specifications that mix string and integer values.

Also just to understand wouldn't datatype mismatch error come in both the version of the library

Newer library versions are smarter, and perform more thorough sanity checking.

It seems like lightGBM is creating a really unbalanced is it?

For starters, you're feeding inconsistent data to LightGBM. How can it do a correct job?

ruzbeh commented 2 years ago

hey @vruusmann

your comments were really really helpful. found the bug was in our data preprocessing Data frame was converting string to a mixed data type array. After casting all features correctly it worked well.

Thanks again for the help