Something wrong with vector?

I just test a toy code in spark 2.1.1. Then it report:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(vFeatures)' due to data type mismatch: argument 1 requires vector type, however, 'vFeatures' is of vector type.;; 'Project [id#9, features#10, vFeatures#11, clicked#12, UDF(vFeatures#11) AS buckedFeatures#87] +- Project [_1#0 AS id#9, _2#1 AS features#10, _3#2 AS vFeatures#11, _4#3 AS clicked#12] +- LocalRelation [_1#0, _2#1, _3#2, _4#3]

I have seen the sourse code , is it because the transform in the ml version of MDLP call this issue:

val discModel = new feature.mdlp_discretization.DiscretizerModel(splits)
val discOp = udf { discModel.transform _ }
dataset.withColumn($(outputCol), discOp(col($(inputCol))).as($(outputCol), metadata))

And in the sprak2 the vector in ml version should be org.apache.spark.ml.linalg.Vector, but the discModel.transform need org.apache.spark.mllib.linalg.Vector. So make the above error?

Here is the toy code

import org.apache.spark.ml.feature.MDLPDiscretizer
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.sql.SparkSession

object test {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder().master("local[3]")
      .appName("GroupStringIndexer module test")
      .getOrCreate()

    val data = Seq(
      (7, 1,Vectors.dense(0.0, 0.0, 18.0, 1.0), 1.0),
      (8, 1,Vectors.dense(0.0, 1.0, 12.0, 0.0), 0.0),
      (9, 0,Vectors.dense(1.0, 0.0, 15.0, 0.0), 0.0)
    )

    val df = spark.sqlContext.createDataFrame(data).toDF("id", "features","vFeatures", "clicked")

    df.show()

    val discretizer = new MDLPDiscretizer()
      .setMaxBins(10)
      .setMaxByPart(10000)
      .setInputCol("vFeatures")
      .setLabelCol("clicked")
      .setOutputCol("buckedFeatures")

    val result = discretizer.fit(df).transform(df)

    result.show()
   }
}

I just try to fix that by https://github.com/sramirez/spark-MDLP-discretization/pull/34 @sramirez THX

sramirez / spark-MDLP-discretization

Something wrong with vector? #33