rstudio / sparkxgb

R interface for XGBoost on Spark
https://spark.posit.co/packages/sparkxgb/
Other
47 stars 14 forks source link

upgrade xgboost4j-spark version to 1.0.0 #28

Closed yitao-li closed 4 years ago

yitao-li commented 4 years ago

Steps for upgrading to xgboost4j-spark:

!#/bin/bash

set -euf -o pipefail # if any of the steps has non-zero exit status then bail 

git clone git@github.com:yl790/sparkxgb.git  # my fork of sparkxgb

# I think sparklyr currently requires the `jar` binary to be either in a standard location
# such as /usr/bin or be found in "${JAVA_HOME}/bin/jar"
# So in order for subsequent steps to succeed, at least one of those must be true 
which jar || ( [[ -v JAVA_HOME ]] && test -f "${JAVA_HOME}/bin/jar") )

cd sparkxgb

mkdir -p internal/xgboost4j-spark
cd internal/xgboost4j-spark
wget https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark_2.12/1.0.0/xgboost4j-spark_2.12-1.0.0.jar

cd - # cd back to the top level of this repository
Rscript configure.R  # rebuild sparkxgb-*.jar

pip3 install xgboost # need xgboost to run tests

cd tests

NOT_CRAN='true' Rscript testthat.R

# ══ testthat results  ═══════════════════════════════════════════════════════════
# [ OK: 59 | SKIPPED: 0 | WARNINGS: 1 | FAILED: 0 ]
yitao-li commented 4 years ago

cc @javierluraschi @kevinykuo

@mzorko I guess you probably tried to go through some of the steps above and almost succeeded. The only thing you needed was to ensure sparklyr could find where the jar binary is located on your machine (see comment above).

yitao-li commented 4 years ago

@mzorko Looks like all tests are passing and there is no objection so far. So I'll merge this PR to master now. In case there is any issue, feel free to revert it.

Data-drone commented 3 years ago

@yitao-li I was trying to follow this and update to xgboost 1.3.1 but when I run Rscript configure.R to rebuild the sparkjar I get:

=> '/home/rstudio/scala/scala-2.12.10/bin/scalac' -opt:l:default -deprecation '/home/rstudio/r_projects/sparkxgb/java/main.scala'
/home/rstudio/r_projects/sparkxgb/java/main.scala:3: error: not found: object ml
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
       ^
/home/rstudio/r_projects/sparkxgb/java/main.scala:6: error: not found: type XGBoostClassifier
  def setMissingParam(xgb: XGBoostClassifier, missing: Double) : XGBoostClassifier = {
                                                                 ^
/home/rstudio/r_projects/sparkxgb/java/main.scala:6: error: not found: type XGBoostClassifier
  def setMissingParam(xgb: XGBoostClassifier, missing: Double) : XGBoostClassifier = {
                           ^
three errors found
Error in spark_compile(jar_name = jar_name, spark_home = spark_home, filter = filter,  : 
  ==> failed to compile Scala source files
Calls: <Anonymous> -> spark_compile
In addition: Warning message:
In file.copy(file.path(scala_path, src), "sparklyr") :
  problem copying /home/rstudio/r_projects/sparkxgb/java/embedded_sources.R to sparklyr/embedded_sources.R: No such file or directory
Execution halted=> '/home/rstudio/scala/scala-2.12.10/bin/scalac' -opt:l:default -deprecation '/home/rstudio/r_projects/sparkxgb/java/main.scala'
/home/rstudio/r_projects/sparkxgb/java/main.scala:3: error: not found: object ml
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
       ^
/home/rstudio/r_projects/sparkxgb/java/main.scala:6: error: not found: type XGBoostClassifier
  def setMissingParam(xgb: XGBoostClassifier, missing: Double) : XGBoostClassifier = {
                                                                 ^
/home/rstudio/r_projects/sparkxgb/java/main.scala:6: error: not found: type XGBoostClassifier
  def setMissingParam(xgb: XGBoostClassifier, missing: Double) : XGBoostClassifier = {
                           ^
three errors found
Error in spark_compile(jar_name = jar_name, spark_home = spark_home, filter = filter,  : 
  ==> failed to compile Scala source files
Calls: <Anonymous> -> spark_compile
In addition: Warning message:
In file.copy(file.path(scala_path, src), "sparklyr") :
  problem copying /home/rstudio/r_projects/sparkxgb/java/embedded_sources.R to sparklyr/embedded_sources.R: No such file or directory
Execution halted

any ideas?

yitao-li commented 3 years ago

@Data-drone OK I'll take a look.