jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

how can i transform string to list use pmml? #97

Closed lumingfeihs closed 4 years ago

lumingfeihs commented 4 years ago

how can i transform string to list use pmml? I want to implement the following functions /**/ override def transform(df: Dataset[_]): DataFrame = {

val string2vector = (x: String) => {
  val a: Array[Double] =
    x.split(",").map(_.toDouble)
  org.apache.spark.ml.linalg.Vectors.dense(a)
}

val str2vec = udf(string2vector)
df.withColumn($(outputCol), str2vec(col($(inputCol))))

} /**/

vruusmann commented 4 years ago

PMML mostly deals with scalar values. There's no built-in function for this, but it could be implemented as a custom Java-backed function.

Anyway, once you have a list of doubles, what you're gonna do with it next? I'm not aware of any common transformation or model type that would accept such argument(s).

If there's another transformation in line, then better implement everything as one big Java-backed custom function.