Override column names when saving LightGBM native model.

microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

https://lightgbm.readthedocs.io/en/latest/

MIT License

16.67k stars 3.83k forks source link

Override column names when saving LightGBM native model. #4999

Closed makquel closed 2 years ago

makquel commented 2 years ago

Summary

When dumping a lgbm native model (command below) the refered model object comes with a generic ordinal numeration for column names.

json.dumps(
    final_model.booster_.dump_model(
        num_iteration=-1, importance_type="gain"
   )
)

The JSON schema that comes from dumping the model looks like this:

{
  "name": "tree",
  "version": "v3",
  "num_class": 1,
  "num_tree_per_iteration": 1,
  "label_index": 0,
  "max_feature_idx": 1816,
  "objective": "cross_entropy",
  "average_output": false,
  "feature_names": [
    "Column_0",
    "Column_1",
    ...
    "Column_1815",
    "Column_1816"
  ],
  "monotone_constraints": [],
  "feature_infos": {}

Would be very useful to have a native method to override features names to match exactly the ones used for training the model.

Motivation

Using the native model as PMML format could facilitate integration with other platforms and even for those who implement a custom .predict function.

jmoralez commented 2 years ago

Hi @makquel, thank you for your interest in LightGBM. The features are saved in the same order they were used for training, i.e.:

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 3)
y = np.random.rand(100)
ds = lgb.Dataset(X, y, feature_name=['x2', 'x0', 'x1'])
bst = lgb.train({'num_leaves': 3, 'verbose': -1}, ds, num_boost_round=1)
print(bst.dump_model()['feature_names'])
# ['x2', 'x0', 'x1']

Are you able to provide a minimal reproducible example where this isn't the case?

no-response[bot] commented 2 years ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.