Closed ruzbeh closed 2 years ago
do let me know if anything else is needed that can help us solve this problem @vruusmann
Exception in thread "main" java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
Didn't you read this exception message - you have a data column, which contains a mix of string and integer values.
If you standardize to a common data type (preferably string), does the first exception persist or not?
Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable
As for the first exception, then the JPMML-LightGBM library performs smart "category levels pruning". Basically it keeps track which category values are sent to left and right branches, and uses this information to generate more compact SimpleSetPredicate
elements.
The expectation is that neither left nor right can not be(come) empty. This expectation is violated in the current case - for some reason all values are sent to the left branch, so that the right branch remains empty.
Thanks for the quick reply @vruusmann
just to add both these issues are on the same model file. different library versions gave a different error the first stack trace is for jpmml- 1.2 the second stack trace is for jpmml- 1.3.11
will cross-check data for mixed data types and will try to standardized if we find this mismatch. Also just to understand wouldn't datatype mismatch error come in both the version of the library
and by the second comment, it seems like lightGBM is creating a really unbalanced is it? is there any parameter to balance lightgbm
just to add both these issues are on the same model file. different library versions gave a different error
It's likely that the second error is a precursor to the first error.
The newer library version catches data inconsistency earlier, and doesn't get to the tree traversal/pruning stage at all. The older library doesn't do such sanity checking.
will cross-check data for mixed data types and will try to standardized if we find this mismatch.
Take a look into the LightGBM text model file, and scroll to the pandas_categorical
section. One (or more) of your columns contain category specifications that mix string and integer values.
Also just to understand wouldn't datatype mismatch error come in both the version of the library
Newer library versions are smarter, and perform more thorough sanity checking.
It seems like lightGBM is creating a really unbalanced is it?
For starters, you're feeding inconsistent data to LightGBM. How can it do a correct job?
hey @vruusmann
your comments were really really helpful. found the bug was in our data preprocessing Data frame was converting string to a mixed data type array. After casting all features correctly it worked well.
Thanks again for the help
Hey,
Can someone help with this
getting this error abruptly
lightGBM version: 3.1.0 jpmml version: 1.2
on updating jpmml to latest version i.e: 1.3.11
getting this error