jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Loading pandas categorical breaks when a name contains square bracket character "]" #24

Closed pmjpawelec closed 5 years ago

pmjpawelec commented 5 years ago

The code doesn't differentiate between syntax "]" and name "]" and throws:

Exception in thread "main" java.lang.IllegalArgumentException: ...
at org.jpmml.lightgbm.GBDT.loadPandasCategorical(GBDT.java:460)
...

presumably in the second iteration of the while loop.

77QingLiu commented 5 years ago

+1

vruusmann commented 5 years ago

Blocked by https://github.com/microsoft/LightGBM/issues/1201, which has been superseded by https://github.com/microsoft/LightGBM/issues/960

TLDR: I'm not going to develop a full-blown JavaCC grammar/parser for dealing with complex feature names, because the LightGBM team is about to change this part of LightGBM model files in a major way.

But feel free to "pressurize" the LightGBM team to work on https://github.com/microsoft/LightGBM/issues/960 by upvoting it.

pmjpawelec commented 5 years ago

@vruusmann this is understandable. I would add the information to the error message in line 460 though - asking to drop the "]" from names. It might save a lot of time for some people.