jpmml / jpmml-evaluator-spark

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
GNU Affero General Public License v3.0
94 stars 43 forks source link

Provide PySpark and SparkR wrapper packages #21

Open naidubharadwaj9 opened 5 years ago

naidubharadwaj9 commented 5 years ago

Is there a way I can use this library for importing a PMML model into Spark using pyspark or scala or sparkR?

vruusmann commented 5 years ago

importing a PMML model into Spark using pyspark or scala or sparkR?

JPMML-Evaluator-Spark is a Java library, which means that you can call it directly from Scala code.

If you want to call it from PySpark or SparkR/Sparklyr, then you should take time to develop simple Python/R wrappers for it. Otherwise, simply use their Java gateways for communication (eg. in PySpark, you can access it via sc._jvm).

SachinTen11 commented 5 years ago

So, we cannot use this package in Zeppelin or Jupyter notebooks? since JPMML is a java package and it cannot be used with a interpreted language? I am new to spark, please correct me if my understanding is incorrect.

vruusmann commented 5 years ago

So, we cannot use this package in Zeppelin or Jupyter notebooks?

You can use this package in whatever Apache Spark ML environment you like. However, if you are using non-Java environments such as Python or R, then it would be recommended to create an appropriate Python or R wrapper API on top of the existing Java API to improve end user experience.

I haven't had time to build those Python or R wrapper APIs yet. But it's a fairly short task, maybe one or two days of development work.