Closed billbargens closed 5 years ago
Thanks for the sample model file - could execute it with the JPMML-LightGBM command-line conversion application and see the profiling results.
In brief, your model contains 28'000 features, and it takes time to perform their usage analysis; the code hotspot is actually located inside the JPMML-Model library, in Visitor method org.jpmml.model.visitors.DeepFieldResolver#applyTo(Visitable)
. This is is a known (but unreported) issue with all PMML conversion applications (eg. R, Scikit-Learn, Apache Spark ML), when the number of features is "high".
As a temporary workaround, you should try to reduce the number of input features from 28'000 to something more manageable. You seem to be dealing with a categorical feature (all 28'000 features seem to represent binary 0/1 features - typical of a one-hot-encoding). Have you tried setting the categorical_feature
option? LightGBM has built-in support for categorical features, and by setting this option it would be possible to "collapse" those 28'000 features into a single feature, which would dramatically speed up the conversion process.
I use the jpmml-lightgbm tool to convert a lightgbm model to a pmml model. The model is simple with num_trees=2 and num_leaves=3, but the converting process is so slow and resource-costing. Why??? The origin model file of lightgbm is as follows: lightgbm_model.txt