Open faaany opened 2 years ago
I tried running the following code locally:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np
from xgboost_ray import RayDMatrix, RayParams
from xgboost import XGBClassifier
from xgboost_ray.main import predict
# Load dataset
data, labels = datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(
data, labels, test_size=0.25)
xgb = XGBClassifier()
xgb.fit(train_x, train_y)
pred = xgb.predict_proba(test_x)[:, 1]
print(pred)
pred_1 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=1))
print(pred_1)
pred_8 = predict(xgb.get_booster(), RayDMatrix(test_x), ray_params=RayParams(num_actors=8))
print(pred_8)
assert np.allclose(pred, pred_1)
assert np.allclose(pred, pred_8)
and got the same results. will try in a distributed setting, and with the higgs dataset.
I can reproduce this
As a workaround, you can either use the new Ray AIR API, or switch to sharding=RayShardingMode.BATCH
in prediction RayDMatrix
.
it works by adding sharding=RayShardingMode.BATCH
to the prediction RayDMatrix
. Close this issue.
thanks!
Let's keep this open as this is still a bug :)
Hi, when using the following code snippet to do xgboost training, I noticed that the results that the
predict
function returns are different when I set the number of actors to different values. In my case, I need to set the number of actors to 1 in thepredict
function in order to get the correct predictions.The returned predictions for num_actors=1:
The returned predictions for num_actors=10:
Is this a bug or am I setting the number of actors wrong? Thanks for your review!