jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Support for `cross_entropy` objective function in regression context? #56

Open phaidara opened 2 years ago

phaidara commented 2 years ago

Hello,

I am currently working on a project where I want to fit a model on probabilities and save it to PMML for later use in Java program.

I am training a LightGBMRegressor with the cross_entropy objective function. The training part is working well. I am able to fit a PMMLPipeline on my data and use it to predict probabilities as expected.

But the saving to PMML part is failing with the following exception:

SEVERE: Failed to convert PKL to PMML java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy' at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47) at sklearn.Estimator.encode(Estimator.java:100) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233) at org.jpmml.sklearn.Main.run(Main.java:217) at org.jpmml.sklearn.Main.main(Main.java:143)

Exception in thread "main" java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy' at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47) at sklearn.Estimator.encode(Estimator.java:100) at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233) at org.jpmml.sklearn.Main.run(Main.java:217) at org.jpmml.sklearn.Main.main(Main.java:143)

It seems that the cross_entropy objective function is not compatible with the LGBMRegressor in the jpmml LightGBM Java API. I tested the cross_entropy with an LGBMClassifier and binary targets (0, 1) instead of probabilities, and this is working fine.

Would it be possible to fix this behavior? Thanks!

Reproducible example:

## Library import
import pandas as pd
import lightgbm as lgb
import sklearn2pmml
from sklearn.datasets import make_classification
from numpy.random import default_rng

# Random classification data
seed = 1234
x, y_cls = make_classification(random_state=seed)

# Fitting classifier on binary target
classifier = lgb.LGBMClassifier(objective = "cross_entropy")
clf_pipeline = sklearn2pmml.PMMLPipeline([("classifier", classifier)])
clf_pipeline.fit(x, y_cls)
# Saving classifier is working fine
sklearn2pmml.sklearn2pmml(clf_pipeline, "working_cross_entropy_classifier.pmml")

# Generating random probability target.
rng = default_rng(seed)
y_reg = rng.uniform(low=0, high=1, size=y_cls.shape)

# Fitting regressor on probability target
regressor = lgb.LGBMRegressor(objective = "cross_entropy")
reg_pipeline = sklearn2pmml.PMMLPipeline([("regressor", regressor)])
reg_pipeline.fit(x, y_reg)

# Prediction output probability scores
reg_pipeline.predict(x)

# But saving pipeline fails with above exception:
sklearn2pmml.sklearn2pmml(reg_pipeline, "non_working_cross_entropy_regressor.pmml") 
vruusmann commented 2 years ago

The JPMML-LightGBM library treats cross_entropy as a classification-type objective function: https://github.com/jpmml/jpmml-lightgbm/blob/1.4.2/pmml-lightgbm/src/main/java/org/jpmml/lightgbm/GBDT.java#L549-L553

It's currently unclear to me if it can and should be usable in regression contexts. Will explore.

https://lightgbm.readthedocs.io/en/latest/Parameters.html