rstudio / sparkxgb

R interface for XGBoost on Spark
https://spark.posit.co/packages/sparkxgb/
Other
46 stars 14 forks source link

File Not Found on spark_connect #15

Open MisterRuss opened 5 years ago

MisterRuss commented 5 years ago

I am working with sparkxgb on an Amazon EMR version 5.16 Spark cluster with RStudio 1.1.447 and R version 3.4.1. I install the libraries with: devtools::install_github("rstudio/sparklyr") devtools::install_github("rstudio/sparkxgb")

When I do a spark_connect Sys.setenv("SPARK_HOME" = "/usr/lib/spark") config <- spark_config() config$spark.dynamicAllocation.enabled = "false" config$spark.executor.memory <- "8G" config$spark.executor.cores <- 4 config$spark.executor.instances <- 2 config$sparklyr.shell.driver-memory <- "4G" config$sparklyr.shell.executor-memory <- "4G" config$spark.yarn.executor.memoryOverhead <- "512"

config$xgboost.spark.ignoreSsl <-"true"

sc <- spark_connect(master = "yarn-client", config = config) I get: Error in force(code) : Failed during initialize_connection: java.io.FileNotFoundException: File file:/mnt/N0131005/.ivy2/jars/com.esotericsoftware.reflectasm_reflectasm-1.07.jar does not exist

If I do not load sparkxgb, I am able to spark_connect successfully (just no XGB).

danielhstahl commented 5 years ago

I have the same problem. Interestingly, com.esotericsoftware.reflectasm_reflectasm-1.07-shaded.jar does exist in .ivy2/jars. If I rename that file to com.esotericsoftware.reflectasm_reflectasm-1.07.jar then spark will connect; though XGBoost doesn't actually work when I do so.

machielg commented 5 years ago

I fixed this problem: by adding com.esotericsoftware.reflectasm:reflectasm:1.08 to the spark.jars.packages: .config("spark.jars.packages", "com.esotericsoftware.reflectasm:reflectasm:1.08,ml.dmlc:xgboost4j-spark:0.90")

danielhstahl commented 5 years ago

I've also created a PR that will resolve this issue in future releases (post .90):

https://github.com/dmlc/xgboost/pull/4575