saurfang / spark-knn

k-Nearest Neighbors algorithm on Spark
Apache License 2.0
233 stars 113 forks source link

Problem with MNIST.scala example #26

Closed kaushikacharya closed 7 years ago

kaushikacharya commented 7 years ago

Facing issue while running spark-knn-examples/src/main/scalacom/github/saurfang/spark/ml/knn/examples/MNIST.scala I had ran the commands in spark-shell. enivronment: spark: 2.1.0 scala: 2.11.8

scala> val pipeline = new Pipeline().setStages(Array(knn_KA)).fit(train) java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42) at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51) at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58) at org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:42) at org.apache.spark.ml.classification.ProbabilisticClassifier.org$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:53) at org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37) at org.apache.spark.ml.classification.ProbabilisticClassifier.validateAndTransformSchema(ProbabilisticClassifier.scala:53) at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:122) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:184) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:136) ... 52 elided

On looking at the dataset schema

scala> dataset.schema res14: org.apache.spark.sql.types.StructType = StructType(StructField(label,DoubleType,true), StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))

it shows that features are of type org.apache.spark.mllib.linalg.VectorUDT Could this be the reason for the error? Is it only me who is getting error while running the example?

Note: data/mnist/mnist.bz2 has no content. Hence I took mnist.bz2 from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html

wadekun commented 7 years ago

i have same error...

kaushikacharya commented 7 years ago

Hi, After I failed to run using spark 2.1.0 I did the following:

Downloaded the source code https://github.com/saurfang/spark-knn/releases/tag/v0.1.1 Built it using spark 1.6.1 (In project/Common.scala and project/Dependencies.scala I have put 1.6.1 for spark version)

Using version 0.1.1 I am able to run the MNIST example.

Regards, Kaushik

On Tue, Jun 20, 2017 at 12:31 PM, Jack Liang notifications@github.com wrote:

i have same error...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/saurfang/spark-knn/issues/26#issuecomment-309663290, or mute the thread https://github.com/notifications/unsubscribe-auth/AEWfs_mpSAY89LWVk79GGmoFonCqyEoSks5sF25fgaJpZM4Nk-nE .

yami6001Kogentix commented 6 years ago

i have same issue with trying to build onevsrest into the Pipeline, it failed with the error below. 18/01/17 15:23:02 ERROR logisticregression.LogisticRegressionAlgorithm: In LogisticRegressionTrainAlgorithm.run error occurred java.lang.IllegalArgumentException: Field "features" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:264) at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:264) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at org.apache.spark.sql.types.StructType.apply(StructType.scala:263) at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40) at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51) at org.apache.spark.ml.classification.OneVsRest.validateAndTransformSchema(OneVsRest.scala:277) at org.apache.spark.ml.classification.OneVsRest.transformSchema(OneVsRest.scala:304) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:184) at org.apache.spark.ml.tuning.ValidatorParams$class.transformSchemaImpl(ValidatorParams.scala:77)