rapidsai / spark-examples

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
https://github.com/NVIDIA/spark-xgboost-examples
Apache License 2.0
70 stars 40 forks source link

eval_sets did not work #78

Closed HarborZeng closed 4 years ago

HarborZeng commented 4 years ago

neither

val clsifier = new XGBoostClassifier()
  .setFeaturesCols(...)
  .setLabelCol(...)
  .setObjective("multi:softprob")
  .setNumClass(16)
  .setMissing(0)
  .setEvalSets(Map("eval_sets" -> evalSet))
  .setNumRound(1000)
  .setNumEarlyStoppingRounds(8)
  .setMaximizeEvaluationMetrics(false)

nor

  .setTrainTestRatio(0.8)

works.

logs are always like:

2020-04-11 15:08:38 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:38,438 INFO [11]  train-merror:0.200699
2020-04-11 15:08:38 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:38,665 INFO [12]  train-merror:0.197640
2020-04-11 15:08:38 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:38,895 INFO [13]  train-merror:0.194580
2020-04-11 15:08:39 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:39,128 INFO [14]  train-merror:0.191678
2020-04-11 15:08:39 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:39,349 INFO [15]  train-merror:0.190037
2020-04-11 15:08:39 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:39,575 INFO [16]  train-merror:0.187184
2020-04-11 15:08:39 INFO  RabitTracker$TrackerProcessLogger:58 - 2020-04-11 15:08:39,806 INFO [17]  train-merror:0.184980

with no eval or test merror printed.

My code snippet is like:

val trainSet = dataReader.parquet("news_feature_label.parquet")
val evalSet = dataReader.parquet("news_feature_label_eval.parquet")
val clsifier = new XGBoostClassifier()
    .setxxx()
    ...
val paramGrid = new ParamGridBuilder()
      .addGrid(clsifier.alpha, Array(0.0, 0.5))
      .addGrid(clsifier.gamma, Array(0.0, 0.5))
val evaluator = new MulticlassClassificationEvaluator()
      .setLabelCol("...")
      .setPredictionCol("...")
      .setMetricName("f1")
val cv = new CrossValidator()
  .setEstimator(clsifier)
  .setEvaluator(evaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(4)
val bestModel = cv.fit(trainSet).asInstanceOf[XGBoostClassificationModel]

What should I do to make it work or does this GPU version xgboost-spark even support eval on train on Corssvalidator?

HarborZeng commented 4 years ago

spark version 2.4.5 mode local[*] Deepin 15.11 GPU Nvidia 2070super ai.rapids:xgboost4j-spark_2.x:1.0.0-Beta5

firestarman commented 4 years ago

spark version 2.4.5 mode local[*] Deepin 15.11 GPU Nvidia 2070super ai.rapids:xgboost4j-spark_2.x:1.0.0-Beta5

Could you get the correct output of eval when running classifier only, without CrossValidator ? And could you share the code how the dataReader is created ?

wbo4958 commented 4 years ago

@HarborZeng , XGBoost-spark with GPU-accelerated didn't support setTrainTestRatio(0.8) for now, So the train-merror should be same

HarborZeng commented 4 years ago

spark version 2.4.5 mode local[*] Deepin 15.11 GPU Nvidia 2070super ai.rapids:xgboost4j-spark_2.x:1.0.0-Beta5

Could you get the correct output of eval when running classifier only, without CrossValidator ? And could you share the code how the dataReader is created ?

yes, eval worked fine without CrossValidator. Sorry it was a long time ago project, and I can't remember the details now.

I believe I created dataReader as one of the example code in this repo, maybe taix?

firestarman commented 4 years ago

I mean you need to create the dataReader of GPU version, such as val dataReader = new GpuDataReader(spark)

HarborZeng commented 4 years ago

I mean you need to create the dataReader of GPU version, such as val dataReader = new GpuDataReader(spark)

yes of course, dataReader of GPU version I am sure of

HarborZeng commented 4 years ago

@wbo4958 @firestarman hey guys, finally figured out: crossvalidate does not support setting EvalSets naturally, k folds would do it likewise.