jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Conversion failure - Failed to convert GBDT to PMML java.lang.IllegalArgumentException #61

Closed rafalAtAdscale closed 1 year ago

rafalAtAdscale commented 1 year ago

trying to convert our model with java -jar pmml-lightgbm-example-executable-1.4.4.jar --lgbm-input model.txt --pmml-output foo.xml i got following error

SEVERE: Failed to convert GBDT to PMML
java.lang.IllegalArgumentException
    at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:299)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:386)
    at org.jpmml.lightgbm.example.Main.run(Main.java:175)
    at org.jpmml.lightgbm.example.Main.main(Main.java:136)

Exception in thread "main" java.lang.IllegalArgumentException
    at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:299)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:386)
    at org.jpmml.lightgbm.example.Main.run(Main.java:175)
    at org.jpmml.lightgbm.example.Main.main(Main.java:136)

looks like possible miscount of categorical features was introduced while fixing #22

As a temporary "fix" I commented out this line and it works fine for us.

vruusmann commented 1 year ago

As a temporary "fix" I commented out this line

If you want to suppress a sanity check, then you should comment out this throw new IllegalArgumentException() instead: https://github.com/jpmml/jpmml-lightgbm/blob/1.4.4/pmml-lightgbm/src/main/java/org/jpmml/lightgbm/GBDT.java#L299

trying to convert our model

Got your model.

Looks like the problem is about a very long pandas_categorical line - the LightGBM library has broken it into two, which are stored on two separate lines.

The JPMML-LightGBM library should combine these two lines back into one, before attempting to parse its contents.

vruusmann commented 1 year ago

Looks like the problem is about a very long pandas_categorical line

I was wrong - the JPMML-LightGBM library is able to combine split lines, so it sees the pandas_categorical line as one.

So, the proposed fix is "don't increment the pandasCategoryIndex variable if the feature is none"?