Hey there! I am using latest version 1.3.0 of jpmml-evaluator-spark but after upgrading to the latest spark version 3.5.0. i am getting this error:
untyped Scala UDF
ERROR org.apache.spark.ml.util.Instrumentation - org.apache.spark.sql.AnalysisException: [UNTYPED_SCALA_UDF] You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.
3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution.
at org.apache.spark.sql.errors.QueryCompilationErrors$.usingUntypedScalaUDFError(QueryCompilationErrors.scala:3157)
at org.apache.spark.sql.functions$.udf(functions.scala:8299)
at org.jpmml.evaluator.spark.PMMLTransformer.transform(PMMLTransformer.scala:99)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$4(Pipeline.scala:311)
at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$3(Pipeline.scala:311)
at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:146)
at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:139)
at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:42)
at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:308)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:307)
After using "spark.sql.legacy.allowUntypedScalaUDF", "true" its working fine.
Is there will any update from your side to solve this?
Hey there! I am using latest version 1.3.0 of jpmml-evaluator-spark but after upgrading to the latest spark version 3.5.0. i am getting this error:
untyped Scala UDF
After using
"spark.sql.legacy.allowUntypedScalaUDF", "true"
its working fine.Is there will any update from your side to solve this?
I found this related closed issue: https://github.com/jpmml/jpmml-evaluator-spark/issues/43 for spark version 3.1.1