jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

possible to add some settings? #16

Closed leemoor closed 5 years ago

leemoor commented 5 years ago

1\ set default value for missing or invalid value in categorical feature? it may occurs error in system when meet values like test1=4 ( not in (0,1,2,3) )

<DataField name="test1" optype="categorical" dataType="integer">
    <Value value="0"/>
    <Value value="1"/>
    <Value value="2"/>
    <Value value="3"/>
</DataField>

2\ like below , is possible to close the margin setting? it may occurs error in system when meet values like 200 (>100)

<DataField name="test2" optype="continuous" dataType="double">
    <Interval closure="closedClosed" leftMargin="0.0" rightMargin="100.0" />
</DataField>
vruusmann commented 5 years ago

How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?

Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using sklearn2pmml.decoration.ContinuousDomain and s.d.CategoricalDomain pseudo-transformation classes.

A very close topic was discussed earlier today on the JPMML mailing list: https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ

leemoor commented 5 years ago

How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?

Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using sklearn2pmml.decoration.ContinuousDomain and s.d.CategoricalDomain pseudo-transformation classes.

A very close topic was discussed earlier today on the JPMML mailing list: https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ

training as in standalone mode :

import lightgbm as lgb
clf = lgb.LGBMClassifier(
                            num_leaves=60,
                         max_depth=6,
                         learning_rate = 0.1,
                         min_data_in_leaf = 100,
                        n_estimators=500,
                        n_jobs=20,
                        bagging_fraction = 0.9 
                          )
clf.fit(train[features], train[target],
                eval_set=[(test[features], test[target])],
                eval_metric= 'auc',
                feature_name=features,
                categorical_feature = cata_feature,
                early_stopping_rounds=500
                )
model_path = model_path='/Users/model.txt'
clf.booster_.save_model(model_path)

then use the jar as below java -jar target/jpmml-lightgbm-executable-1.2-SNAPSHOT.jar --lgbm-input model.txt --pmml-output model.pmml

vruusmann commented 5 years ago

training as in standalone mode

By "standalone mode" I meant that perhaps you're using command-line lgbm.exe or something.

However, you appear to be using Scikit-Learn as an abstraction layer. Simply wrap your LGBMClassifier into a sklearn2pmml.pipeline.Pipeline, and apply sklearn2pmml.decoration.ContinuousDomain transformers to problematic columns.

See my latest JPMML mailing list post for a complete example.