jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
GNU Affero General Public License v3.0
174 stars 58 forks source link

Why the converting process is so slow and resource-costing? #2

Closed billbargens closed 5 years ago

billbargens commented 7 years ago

I use the jpmml-lightgbm tool to convert a lightgbm model to a pmml model. The model is simple with num_trees=2 and num_leaves=3, but the converting process is so slow and resource-costing. Why??? The origin model file of lightgbm is as follows: lightgbm_model.txt

vruusmann commented 7 years ago

Thanks for the sample model file - could execute it with the JPMML-LightGBM command-line conversion application and see the profiling results.

In brief, your model contains 28'000 features, and it takes time to perform their usage analysis; the code hotspot is actually located inside the JPMML-Model library, in Visitor method org.jpmml.model.visitors.DeepFieldResolver#applyTo(Visitable). This is is a known (but unreported) issue with all PMML conversion applications (eg. R, Scikit-Learn, Apache Spark ML), when the number of features is "high".

As a temporary workaround, you should try to reduce the number of input features from 28'000 to something more manageable. You seem to be dealing with a categorical feature (all 28'000 features seem to represent binary 0/1 features - typical of a one-hot-encoding). Have you tried setting the categorical_feature option? LightGBM has built-in support for categorical features, and by setting this option it would be possible to "collapse" those 28'000 features into a single feature, which would dramatically speed up the conversion process.