ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
139 stars 34 forks source link

predict function doesn't return the correct predictions with num_actors >1 #231

Open faaany opened 2 years ago

faaany commented 2 years ago

Hi, when using the following code snippet to do xgboost training, I noticed that the results that the predict function returns are different when I set the number of actors to different values. In my case, I need to set the number of actors to 1 in the predict function in order to get the correct predictions.

   `ray.init() 
    cpus_per_actor = 15
    num_actors = 10
    ray_params = RayParams(num_actors=num_actors, cpus_per_actor=cpus_per_actor, elastic_training=True, max_failed_actors=1, max_actor_restarts=1)`

    dtrain = RayDMatrix(
                    train_path,
                    label=name,  
                    columns=feature_list[numlabel],
                    filetype=RayFileType.PARQUET)
    dvalid = RayDMatrix(
            valid_path,
            label=name, 
            columns=feature_list[numlabel],
            filetype=RayFileType.PARQUET)

    print("Training.....")
    model = train(xgb_parms, 
            dtrain,
            evals=[(dtrain,'train'),(dvalid,'valid')],
            num_boost_round=250,
            early_stopping_rounds=25,
            verbose_eval=25,
            ray_params=ray_params)

    model.save_model(f"{model_save_path}/xgboost_{name}_stage1.model")

    print('Predicting...')        
    dvalid = RayDMatrix(
                    valid_path,
                    label=name, 
                    columns=feature_list[numlabel],
                    filetype=RayFileType.PARQUET)

    oof[:, numlabel] = predict(model, dvalid,  ray_params=RayParams(num_actors=num_actors, cpus_per_actor=1))`

The returned predictions for num_actors=1:

[0.00197015 0.00656855 0.00210109 ... 0.00132486 0.00912175 0.03348438]

The returned predictions for num_actors=10:

[0.00253869 0.02829305 0.0060115 ... 0.00152305 0.01026866 0.03538961]

Is this a bug or am I setting the number of actors wrong? Thanks for your review!

Yard1 commented 2 years ago

I tried running the following code locally:

from sklearn import datasets
from sklearn.model_selection import train_test_split

import numpy as np

from xgboost_ray import RayDMatrix, RayParams
from xgboost import XGBClassifier

from xgboost_ray.main import predict

# Load dataset
data, labels = datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(
    data, labels, test_size=0.25)

xgb = XGBClassifier()
xgb.fit(train_x, train_y)
pred = xgb.predict_proba(test_x)[:, 1]
print(pred)

pred_1 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=1))
print(pred_1)

pred_8 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=8))
print(pred_8)

assert np.allclose(pred, pred_1)
assert np.allclose(pred, pred_8)

and got the same results. will try in a distributed setting, and with the higgs dataset.

Yard1 commented 2 years ago

I can reproduce this

Yard1 commented 2 years ago

As a workaround, you can either use the new Ray AIR API, or switch to sharding=RayShardingMode.BATCH in prediction RayDMatrix.

faaany commented 2 years ago

it works by adding sharding=RayShardingMode.BATCH to the prediction RayDMatrix. Close this issue.

faaany commented 2 years ago

thanks!

Yard1 commented 2 years ago

Let's keep this open as this is still a bug :)