jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Model attempting a split on an unused feature (leading to NPE) #64

Closed csrookie-zoe closed 1 month ago

csrookie-zoe commented 1 month ago

I trained a lightgbm model and saved this model as .txt file. Then I use the command to convert this txt file to pmml file , but fail.

Command as follow:

java -jar pmml-lightgbm-example/target/pmml-lightgbm-example-executable-1.5-SNAPSHOT.jar --lgbm-input lgb_model.txt --pmml-output lightgbm_model.pmml

ERROR INFO:

7月 19, 2024 2:13:19 下午 org.jpmml.lightgbm.example.Main run
信息: Loading GBDT..
7月 19, 2024 2:13:19 下午 org.jpmml.lightgbm.example.Main run
信息: Loaded GBDT in 697 ms.
7月 19, 2024 2:13:19 下午 org.jpmml.lightgbm.example.Main run
信息: Converting GBDT to PMML..
7月 19, 2024 2:13:20 下午 org.jpmml.lightgbm.example.Main run
严重: Failed to convert GBDT to PMML
java.lang.NullPointerException: Cannot invoke "org.jpmml.converter.Feature.toContinuousFeature()" because "feature" is null
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:244)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:267)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:267)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:110)
        at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:74)
        at org.jpmml.lightgbm.BinomialLogisticRegression.encodeModel(BinomialLogisticRegression.java:46)
        at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:436)
        at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:423)
        at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:411)
        at org.jpmml.lightgbm.example.Main.run(Main.java:175)
        at org.jpmml.lightgbm.example.Main.main(Main.java:136)

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.jpmml.converter.Feature.toContinuousFeature()" because "feature" is null
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:244)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:267)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:267)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:268)
        at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:110)
        at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:74)
        at org.jpmml.lightgbm.BinomialLogisticRegression.encodeModel(BinomialLogisticRegression.java:46)
        at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:436)
        at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:423)
        at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:411)
        at org.jpmml.lightgbm.example.Main.run(Main.java:175)
        at org.jpmml.lightgbm.example.Main.main(Main.java:136)`

But in my txt model file, feature is not NULL , the content is :

tree
version=v3
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=688
objective=binary sigmoid:1
feature_names=max_user_trd_redmoney_in_amt_30d max_user_trd_180rgstr_in_amt_30d max_user_trd_jdcl_freeze_in_amt_30d sum_user_trd_100tl_opp_out_cnt_30d sum_user_trd_100t_opp_in_cnt_30d max_register_days_out sum_user_trd_100t_opp_out_cnt_30d sum_user_trd_jdcl_freeze_out_amt_30d min_user_register_days_all

How to solve this error?

vruusmann commented 1 month ago

The LightGBM model file holds feature information using two attributes: feature_names and feature_infos.

If some feature is present in the training dataset, but is actually not used by the model, then its feature_info is set to none value (all features that are used have non-none values).

If the JPMML-LightGBM converter find a none value, then it leaves this model schema entry blank. By convention, this is represented by a Java's null reference.

This NPE can only happen if there is a conflict between the feature_infos attribute and the actual model content. The former says that a feature is not used by the model, but the latter actually attempts to generate a split based on this feature.

How to solve this error?

Never saw this kind of NPE myself.

My first intuition is that the LightGBM model file is corrupted - did you modify it manually in any way? Also, the model fragment that you pasted above looks incorrect, because there is no feature_infos attribute at all.

I'd need to see a complete LightGBM model file to give more meaningful answers.

csrookie-zoe commented 1 month ago

The LightGBM model file holds feature information using two attributes: feature_names and feature_infos.

If some feature is present in the training dataset, but is actually not used by the model, then its feature_info is set to none value (all features that are used have non-none values).

If the JPMML-LightGBM converter find a none value, then it leaves this model schema entry blank. By convention, this is represented by a Java's null reference.

This NPE can only happen if there is a conflict between the feature_infos attribute and the actual model content. The former says that a feature is not used by the model, but the latter actually attempts to generate a split based on this feature.

How to solve this error?

Never saw this kind of NPE myself.

My first intuition is that the LightGBM model file is corrupted - did you modify it manually in any way? Also, the model fragment that you pasted above looks incorrect, because there is no feature_infos attribute at all.

I'd need to see a complete LightGBM model file to give more meaningful answers.

Sorry, for data sensitivity reasons, I can't provide the full txt file, I can only tell you verbally what I'm having trouble with. I went to check the specifics within the txt file and there is content within the feature_infos, it is something like ‘feature_infos=[-999:65][-999:40]...... ’ And there is no None value in feature_infos, I don't know what else to check, but I need your help very much, thanks!

vruusmann commented 1 month ago

Sorry, for data sensitivity reasons, I can't provide the full txt file,

Can you reproduce this NPE using some public (toy-) dataset?

I went to check the specifics within the txt file and there is content within the feature_infos, it is something like feature_infos=[-999:65][-999:40]......

The only way how this NPE to happen is that there are null elements inside the Schema#getFeatures() feature list. And the only way how null elements can get in there is by having none elements in the feature_infos attribute.

Specifically, see this: https://github.com/jpmml/jpmml-lightgbm/blob/1.5.4/pmml-lightgbm/src/main/java/org/jpmml/lightgbm/GBDT.java#L205-L206

I don't know what else to check

You have full access to the JPMML-LightGBM source code at GitHub. And it looks to me that you've already successfully downloaded and built a binary version of it (because you're using a 1.5-SNAPSHOT snapshot version, not any of my pre-built point release versions).

Now, feel free to insert System.out.prinln(...) statements into it in order to pinpoint the location where a null element gets inserted into Schema#getFeatures() list.

I need your help very much, thanks!

I can't help you without a LightGBM model file.

You either reproduce the issue with a new dataset that can be shared, or you debug it locally using System.out.println(...) statements.

csrookie-zoe commented 1 month ago

Sorry, for data sensitivity reasons, I can't provide the full txt file,

Can you reproduce this NPE using some public (toy-) dataset?

I went to check the specifics within the txt file and there is content within the feature_infos, it is something like feature_infos=[-999:65][-999:40]......

The only way how this NPE to happen is that there are null elements inside the Schema#getFeatures() feature list. And the only way how null elements can get in there is by having none elements in the feature_infos attribute.

Specifically, see this: https://github.com/jpmml/jpmml-lightgbm/blob/1.5.4/pmml-lightgbm/src/main/java/org/jpmml/lightgbm/GBDT.java#L205-L206

I don't know what else to check

You have full access to the JPMML-LightGBM source code at GitHub. And it looks to me that you've already successfully downloaded and built a binary version of it (because you're using a 1.5-SNAPSHOT snapshot version, not any of my pre-built point release versions).

Now, feel free to insert System.out.prinln(...) statements into it in order to pinpoint the location where a null element gets inserted into Schema#getFeatures() list.

I need your help very much, thanks!

I can't help you without a LightGBM model file.

You either reproduce the issue with a new dataset that can be shared, or you debug it locally using System.out.println(...) statements.

Thank you so much for your advice! I recheck my txt file,and i really find a 'none' element in feature_infos! Only one 'none' element in it. What can I do next, can I remove this ‘none’ element from feature_infos?

vruusmann commented 1 month ago

I recheck my txt file,and i really find a 'none' element in feature_infos!

TOLD YOU SO!

What can I do next, can I remove this ‘none’ element from feature_infos?

You must not delete it (because that would mess up the indexing of feature_names and feature_infos attributes).

Instead, find out what is the name of this feature (feature_names and feature_infos are two lists with equal number of elements), and then choose one action:

vruusmann commented 1 month ago

It is still interesting that a column whose LigthGBM feature info is none is referenced by the LightGBM model during tree splitting. This should never happen.

If anyone can reproduce this issue (ie. a NPE), and share a model, I'd be very much interested!

vruusmann commented 1 month ago

Decided to reopen this issue, because it is very unprofessional to have my library raise an NPE.

Will replace null element with a org.jpmml.lightgbm.NullFeature object, which will then raise a proper/meaningful error when it is being attempted to use for tree splitting.

vruusmann commented 1 month ago

For some reason, this issue reminds me of https://github.com/jpmml/jpmml-lightgbm/issues/63

A null element should never be hit during tree splitting. There is something extra happening, which causes a "misalignment" of feature accesses.

Perhaps the none feature info represents a categorical feature? If so, the value of the pandasCategoryIndex variable could be off by one in JPMML-LightGBM versions that are older than 1.5.4.