jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

jpmml-lightgbm seems broken in current model LightGBM generated #5

Closed yuanqingz closed 7 years ago

yuanqingz commented 7 years ago

OS: Debian 8.4 LightGBM Python API version: 2.0.6 (installed from pip) jpmml-lightgbm version: current master

The traceback:

Exception in thread "main" java.lang.IllegalArgumentException
    at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:153)
    at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:77)
    at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:67)
    at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:49)
    at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:196)
    at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:188)
    at org.jpmml.lightgbm.Main.run(Main.java:113)
    at org.jpmml.lightgbm.Main.main(Main.java:103)

It looks like the same problem in pmml.py from LightGBM's repo https://github.com/Microsoft/LightGBM/issues/877 . Tree.java didn't deal with num_cat in the origin model when iterating a tree, so decision_type loads the wrong data.

A single tree in my model looks like below

Tree=0
num_leaves=31
num_cat=0
split_feature=130 137 25 136 43 41 34 136 137 136 44 137 136 21 137 25 35 1 60 136 137 132 138 34 24 2 137 136 137 136
split_gain=5812.6865850533359 4750.7712912423303 4724.2992261658801 4667.4854093973117 2357.4763379315846 2483.3486637981259 6945.2259999360776 2243.4213386228075 2374.0770696816035 1467.3039153231657 1285.9200319876691 1236.355555283842 921.04117468325421 891.75068549756679 882.8234877595678 843.49713683005211 757.18833604357701 602.65069390989947 556.37066702117988 551.82002711453242 1869.1694134594436 759.48473026501597 842.03145439751813 539.22499822972213 529.71317166717029 524.1214251666097 440.46416312502413 416.64349749434041 640.79000547653413 407.67497246622224
threshold=841.5 1020.5 10150 313.5 9.9999996826552254e-21 4.8500749999999995 32907.5 946.5 1894.5 1331 9.8333333333333339 804.5 82.5 362504 1145.5 4565.5 2378.5 9.9999996826552254e-21 1277.5 107.5 616.5 971.5 -27974418597.5 83611.5 1275.5 9.9999996826552254e-21 120.5 196.5 2573.5 145.5
decision_type=2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
left_child=4 2 3 10 5 9 -7 27 15 19 29 -11 -6 -12 -14 -9 -5 -8 -4 -1 26 -22 -23 -15 -17 -10 -21 -3 -29 -2
right_child=1 7 18 16 12 6 17 8 25 11 13 -13 14 23 -16 24 -18 -19 -20 20 21 22 -24 -25 -26 -27 -28 28 -30 -31
leaf_value=-0.18077119270056649 -0.15993369880754066 -0.18453940579902467 0.13216132565084632 0.027432727681506765 -0.19213865580343734 -0.16818294237615483 -0.14436090440678417 -0.10215443431535599 -0.16384608642321105 0.093979443397465959 0.11306532831797048 -0.1286364873173152 -0.14502337944082802 -0.11760013470900144 -0.18429276208529438 0.13068826105671855 -0.085921384051446295 0.16820182122156815 -0.12460733169972585 0.10546737370562512 -0.17406932486954349 -0.16393200539917041 0.07552986625072354 0.023529412115321439 -0.035882059505199085 -0.13294451392918474 -0.078587338110272617 -0.1493108119203401 -0.17757694880208788 -0.11940408491334077
leaf_count=361208 114021 321333 2901 6875 1214601 12157 266 5245 198186 1362 796 3733 23741 12034 646046 1235 3587 3409 382 567 97429 1353 1038 1190 2001 24690 6286 36347 273308 10874
internal_value=0 -1.6756035684553279 -1.3503995807677192 -1.4025586268301011 -1.8564467494467376 -1.7326803161977709 -0.95351187468418397 -1.7331740776603333 -1.5647073570283156 -1.7587615439261188 -1.4995788791707159 -0.6912659470068695 -1.8885516146356272 -0.92524964336661908 -1.8290083265276871 -0.52611720316000476 -0.1143184859491493 1.4557823129251701 1.022844958879074 -1.7703860597032151 -1.6439961377293224 -1.7133640552995393 -0.59974905897114184 -1.0490018148820326 0.276885043263288 -1.6042283601643963 -0.63359112797315043 -1.7949438024177955 -1.7425909479905055 -1.5640498018335403
internal_count=3388201 1015005 152660 149377 2373196 488808 15832 862345 231357 472976 138915 5095 1884388 14020 669787 8481 10462 3675 3283 467881 106673 99820 2391 13224 3236 222876 6853 630988 309655 124895
shrinkage=0.1
vruusmann commented 7 years ago

The JPMML-LightGBM library has fallen behind the LightGBM project due to lack of time and resources.

The latest supported LightGBM version is something from mid-April 2017; simply open the commit log (https://github.com/jpmml/jpmml-lightgbm/commits/master), and search for commit messages starting with "Ensured compatibility with LightGBM ...". For example, the latest supported LightGBM version appears to be commit 57ad014: https://github.com/jpmml/jpmml-lightgbm/commit/2e5727e24426c4ab8c8ff63240d5f124118a86b1

I do intend to catch up with LightGBM project again, but I can't tell you when exactly that will happen. Will fix/close this issue when it happens, so stay subscribed.

yuanqingz commented 7 years ago

@vruusmann Thanks for the reply! I'll stay subscribed.