rstudio / sparkxgb

R interface for XGBoost on Spark
https://spark.posit.co/packages/sparkxgb/
Other
46 stars 14 forks source link

New version of xgboost4j #27

Closed mzorko closed 4 years ago

mzorko commented 4 years ago

Hello everyone,

Is it possible to get a new version of xgboost4j running here? I have tried to implement it myself, but I don't understand how it works exactly.

It would also be very nice if it could work through Livy connection. I would also be interested in any help/manual on how to implement it.

Much thanks, Mislav

nredell commented 4 years ago

Possibly related: I couldn't get the examples from the README to run with new installs across the board from spark through xgboost. Got the error:

Error: ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed at ml.dmlc.xgboost4j.scala.spark.XGBoost$.ml$dmlc$xgboost4j$scala$spark$XGBoost$$postTrackerReturnProcessing(XGBoost.scala:364)
mzorko commented 4 years ago

Hmm, not sure as well. But all in all, I think it is time to upgrade this little gem here so that:

I would really like to help here, but I am stuck with the configure.R file that creates spark-scala .jar file. The sparklyr::compile_package_jars(spec = spec) part is not working. It says (Error: program 'jar' is required but not available on the path) that $jar is missing from the spec list and I have no idea what to put there. Any idea is welcome here.

yitao-li commented 4 years ago

@mzorko I think https://github.com/rstudio/sparkxgb/pull/28 should work -- will wait for it to be reviewed and tested and then write a wiki page on how to rebuild the jar files

Also, we can go for the lastest version of xgboost4j 1.0 instead of 0.9

mzorko commented 4 years ago

Thanks @yl790 for looking into this. Yeah, first step would be to rebuild the jar files. This is the first time I am playing with sparklyr extensions, so will try to learn as much as possible.

I think there will be some work with parameter mapping (from sparkxgb to xgboost4j 1.0) because some basic parameters have changed (names, values, ....) in the xgboost4j 1.0 and we will need to carefully map them back. I will try to find some time tomorrow to take a look at that.