ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

README.md Example doesn't work #75

Closed Jeffwan closed 3 years ago

Jeffwan commented 3 years ago

XGBoost example doesn't work

➜  xgboost_ray git:(master) ✗ python3 xgboost-ray.py
File descriptor limit 256 is too low for production servers and may result in connection errors. At least 8192 is recommended. --- Fix with 'ulimit -n 8192'
2021-04-06 22:24:33,719 INFO services.py:1174 -- View the Ray dashboard at http://127.0.0.1:8265
Traceback (most recent call last):
  File "xgboost-ray.py", line 18, in <module>
    cpus_per_actor=1))
  File "/private/tmp/ray/xgboost_ray/xgboost_ray/main.py", line 1134, in train
    "Training data has no label set. Please make sure to set "
ValueError: Training data has no label set. Please make sure to set the `label` argument when initializing `RayDMatrix()` for data you would like to train on.

code snippet is from README.md

from xgboost_ray import RayDMatrix, RayParams, train

train_x, train_y = None, None  # Load data here
train_set = RayDMatrix(train_x, train_y)

evals_result = {}
bst = train(
    {
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
    },
    train_set,
    evals_result=evals_result,
    evals=[(train_set, "train")],
    verbose_eval=False,
    ray_params=RayParams(
        num_actors=2,
        cpus_per_actor=1))

bst.save_model("model.xgb")
print("Final training error: {:.4f}".format(
    evals_result["train"]["error"][-1]))
krfricke commented 3 years ago

Hi @Jeffwan, there is this line:

train_x, train_y = None, None  # Load data here

which you have to replace with the training data you want to be training on. Something like this will work:

from xgboost_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer

train_x, train_y = load_breast_cancer(return_X_y=True)  # Load data here
train_set = RayDMatrix(train_x, train_y)

evals_result = {}
bst = train(
    {
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
    },
    train_set,
    evals_result=evals_result,
    evals=[(train_set, "train")],
    verbose_eval=False,
    ray_params=RayParams(
        num_actors=2,
        cpus_per_actor=1))

bst.save_model("model.xgb")
print("Final training error: {:.4f}".format(
    evals_result["train"]["error"][-1]))

But yeah, we probably should update the readme to use an example with actual data. I'll file a PR for that

Jeffwan commented 3 years ago

Ah, thanks. I didn't notice the adjustment requirement. If there's a full example, that will be great!

Jeffwan commented 3 years ago

I tried example you attached in the PR. It's working well. Thanks for the improvement