Closed viirya closed 7 years ago
ping @vruusmann Can you take a look this if it is good to merge? Thanks.
I don't want to jump to Spark 2.0.X until there's no official release for Spark 1.5.X/1.6.X available. By "official release" I mean something that has stable API (in terms of TransformerBuilder
functionality), and has been pushed to Maven Central repository. The goal is to minimize the difference between Spark 1.5.X/1.6.X and 2.0.X codebases, so that it would be easier to keep them in sync for extended periods of time.
Some things that need more attention/work:
TransformerBuilder
class are too complex. After the JPMML-Evaluator dependency has been upgraded to 1.3.3 (which relaxes the visibility of org.jpmml.evaluator.ModelField
subclasses), then it will be possible to collapse many of org.jpmml.spark.*ColumnProducer
classes.Transformer
instance should implement some kind of interface HasModelFields
, which would let you easily query the names/types of input and result columns. For example, HasModelFields#getInputCols()
and HasModelFields#getResultCols()
. You can use this information to check if the Transformer object is "logically compatible" with the argument DataFrame
object or not.PMMLTransformer
and ColumnPruner
should be translated from Java to Scala (IIRC, latest versions of Apache Spark should have a reusable ColumnPruner
transformation already built-in). This involves tweaking the Apache Maven build (eg. to invoke scalac
compiler).I can easily do the first two items. However, I will have difficulties with the third item, because my working experience with Scala is very minimal. If you want to help keep things moving, then you could submit another PR in that area.
OK. I got it. I'd close this now.
This PR is to add support Spark 2.0. I have tested with Spark 2.0.2 release like:
bin/spark-submit --master local --class org.jpmml.spark.EvaluationExample pmml-spark-example/target/example-1.0-SNAPSHOT.jar DecisionTreeIris.pmml Iris.csv /tmp/DecisionTreeIris
And it works.