jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

i hava a very strange problem. When I saved the result dataframe as a CSV file using spark's API.==>gbt value is not defined #90

Closed zdkzdk closed 4 years ago

zdkzdk commented 4 years ago

I use the scala-spark API 'CSV' method to save the transformed df as a CSV file.Prompt 'gbtValue is not defined'. Oddly, the input-dataframe to transform was retrieved from the hive table through an SQL query.If I created the input-dataframe directly from the HDFS file, I saved it successfully.

When I saved 1 dataframe as a CSV file, it should have nothing to do with pmml-spark, why did JPMML report gtbValue not defined?

resultDF.coalesce(1).write .mode(saveMode = SaveMode.Overwrite) .csv("path") spark2.2.0-CDH,jpmml-evaluator-spark2.2.0,

vruusmann commented 4 years ago

Why did JPMML report gtbValue not defined?

GBDT models are encoded as a two-segment model chain. The first segment computes a gbtValue (by evaluating a list of boosters), and the second segment applies a link function to it in order to obtain a probability density.

In your case, the first segment has returned a missing value, because one of the required input field values was missing.

This is a model evaluation exception (JPMML-Evaluator-Spark library), not a model conversion exception (JPMML-SparkML library). Next time, please take a few seconds to read your exception message properly, and open the issue with the correct project.