yu-iskw / spark-kuromoji-tokenizer

Kuromoji Tokenizer for Spark DataFrames
https://spark-packages.org/package/yu-iskw/spark-kuromoji-tokenizer
Apache License 2.0
6 stars 2 forks source link

Supports Spark 2.2 #2

Open e-hu opened 6 years ago

e-hu commented 6 years ago

is not support Spark 2.2 when create Tokenizer object

scala> val kuromoji = new org.apache.spark.ml.feature.KuromojiTokenizer().setInputCol("text").setOutputCol("tokens")
java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
  at java.lang.Class.getDeclaredMethods0(Native Method)
  at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
  at java.lang.Class.privateGetPublicMethods(Class.java:2902)
  at java.lang.Class.getMethods(Class.java:1615)
  at org.apache.spark.ml.param.Params$class.params(params.scala:547)
  at org.apache.spark.ml.PipelineStage.params$lzycompute(Pipeline.scala:42)
  at org.apache.spark.ml.PipelineStage.params(Pipeline.scala:42)
  at org.apache.spark.ml.param.Params$class.hasParam(params.scala:595)
  at org.apache.spark.ml.PipelineStage.hasParam(Pipeline.scala:42)
  at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:747)
  at org.apache.spark.ml.param.Params$class.set(params.scala:623)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:42)
  at org.apache.spark.ml.param.Params$class.set(params.scala:609)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:42)
  at org.apache.spark.ml.feature.KuromojiTokenizer.setInputCol(KuromojiTokenizer.scala:52)
  ... 58 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  ... 73 more
yu-iskw commented 6 years ago

@e-hu Thank you for the feedback. Unfortunately, I don't have much time to modify the issue. Are you willing to send a PR?

ashosaho commented 5 years ago

This should work /*

package org.apache.spark.ml.feature

import scala.collection.JavaConverters. import org.atilika.kuromoji.{Token => KToken, Tokenizer => KTokenizer} import org.apache.spark.annotation.Experimental import org.apache.spark.ml.Transformer import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} import org.apache.spark.ml.param.{Param, ParamMap, Params} import org.apache.spark.ml.util.{DefaultParamsReadable, DefaultParamsWritable, Identifiable} import org.apache.spark.sql.{DataFrame, Dataset} import org.apache.spark.sql.functions. import org.apache.spark.sql.types._

/**

/**

/**

/**