jq / pyspark_xgboost

BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

add pyspark xgboost #1

Open jq opened 6 years ago

jq commented 6 years ago

follow pyspark mllib algorithm, i.e. https://github.com/apache/spark/blob/master/python/pyspark/mllib/regression.py#L189 wrap the xgboost spark so that we can call xgboost spark from python, in general make https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkModelTuningTool.scala#L160 work from python.

pyspark use https://www.py4j.org/getting_started.html

and it already wraps it, so just follow how it accesses the java/scala code, and add jar in the spark-submit

jq commented 6 years ago

Felix: from pyspark.ml.wrapper import JavaWrapper

java_obj = JavaWrapper._new_java_obj( "ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator", uid, xgboostParams)