jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Version v4 is not supported #135

Closed zwag20 closed 1 year ago

zwag20 commented 1 year ago

I am getting the following error when I try to export a pmml from lightgbm. I am not sure how to troubleshoot this error. I am using databricks (Spark 3.2.1). I have the following packages installed.

com.microsoft.azure:synapseml_2.12:0.11.1 org.jpmml:pmml-sparkml-lightgbm:2.2.2 pyspark2pmml-0.5.1-py3-none-any.whl py4j-0.10.9.7-py2.py3-none-any.whl

Here is my code for reference

from synapse.ml.lightgbm import LightGBMClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorAssembler
from pyspark.ml import Pipeline
stages = []
for categoricalCol in categoricalColumns:
    indexers = StringIndexer(inputCol = categoricalCol, outputCol = categoricalCol+ '_Index').setHandleInvalid("keep")
    stages += [indexers]
assemblerInputs = [c + "_Index" for c in categoricalColumns] + numericColsFeatures
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]    
lgbm = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label")
stages += [lgbm]
pipeline = Pipeline(stages = stages)
print('Running model')
pipelineModel = pipeline.fit(df_final_under_9_1)

from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc,df_final_under_9_1,pipelineModel)
pmmlBuilder.buildFile("/dbfs/FileStore/pmmlModel_test.pmml")

Error

IllegalArgumentException: Version v4 is not supported
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<command-1148977676037695> in <module>
      2 
      3 pmmlBuilder = PMMLBuilder(sc,df_final_under_9_1,pipelineModel)
----> 4 pmmlBuilder.buildFile("/dbfs/FileStore/pmmlModel_test.pmml")

/databricks/python/lib/python3.8/site-packages/pyspark2pmml/__init__.py in buildFile(self, path)
     25         def buildFile(self, path):
     26                 javaFile = self.sc._jvm.java.io.File(path)
---> 27                 javaFile = self.javaPmmlBuilder.buildFile(javaFile)
     28                 return javaFile.getAbsolutePath()
     29 

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
vruusmann commented 1 year ago

IllegalArgumentException: Version v4 is not supported

Closing as a duplicate of https://github.com/jpmml/jpmml-lightgbm/issues/58

Your LightGBM model string indicates that it has been generated by LightGBM 4.X.

AFAIK, the official LightGBM 4.0.0 release hasn't happened yet, so it cannot be supported (ie. want to how is it different from LightGBM 3.3.5).

Downgrade your SynapseML dependency to some LightGBM 3.X based one, and the conversion will succeed.

vruusmann commented 1 year ago

For example, SynapseML 0.10.2 should be OK.