Closed dragarok closed 1 year ago
I found an issue here in github regarding this. I checked my whole dataframe for null values using df.isnull().values.any()
but it all results to false.
Can you provide XGBoost training parameters or reproducer code? Problems with missing values are known, one possible reason is 'gpu_hist' training method which output XGBoost booster with active missing values indicators even if training data hasn't them.
Can you provide XGBoost training parameters or reproducer code? Problems with missing values are known, one possible reason is 'gpu_hist' training method which output XGBoost booster with active missing values indicators even if training data hasn't them.
Oh yes. I am using 'gpu_hist' as training tree method. So, is there no way to train on gpu and get model for sure to work with daal?
For now, xgboost models from gpu_hist
method are not working with daal4py. This behavior is not expected and probably bug, but its source (daal4py or xgboost) is non-obvious since hist
is working. I will investigate it.
@Alexsandruss - Is there any workaround?
It looks like problem was solved from XGBoost side: boosters created by gpu_hist
tree method are easily translated to DAAL models now if no missing values are presented (DAAL inference is not supporting them).
SW used to check this case: Python 3.9.13, XGBoost 1.6.1 pip package, daal4py/oneDAL 2021.5.0 conda-forge packages. Driver Version: 515.65.01, CUDA Version: 11.7.
HW: Tesla T4 GPU for gpu_hist
training.
Testing script:
import xgboost as xgb
import daal4py as d4p
import numpy as np
from sklearn.datasets import make_classification
x, y = make_classification(n_samples=10000, n_features=16, n_classes=2)
xgb_clsf = xgb.XGBClassifier(tree_method='gpu_hist')
xgb_clsf.fit(x, y)
booster = xgb_clsf.get_booster()
xgb_prediction = xgb_clsf.predict(x)
xgb_errors_count = np.count_nonzero(xgb_prediction - y)
daal_model = d4p.get_gbt_model_from_xgboost(booster)
daal_predict_algo = d4p.gbt_classification_prediction(
nClasses=2,
resultsToEvaluate="computeClassLabels",
fptype='float'
)
daal_prediction = daal_predict_algo.compute(x, daal_model).prediction.astype('int').ravel()
daal_errors_count = np.count_nonzero(daal_prediction - y)
assert np.absolute(xgb_errors_count - daal_errors_count) == 0
Renamed issue to be less confusing
I am also running xgboost 1.6.1 and the problem exists. I am creating a regressor, not a classifier. Probably more to the point, on a data set of about 75,000 records it worked fine. My full data set is over 5,000,000 records and only then did the problem appear.
I did find a workaround: After training my hyperparameters with 'gpu_hist', I retrain the model one last time with the best hyperparameters and 'hist' and then create the daal model. In any case, I believe the bug still exists.
I run regressor on synthetic data with 7.5M x 64 shape and got no error. @dklein0, can you share the origin of your data? Does it have NaN/Inf values?
My data is financial data that I have pre-processed with a C# program to create my feature set.
The feature set has NaN values, but I use a panda DataFrame, df, and call df.dropna(inplace=true) before using it. I also checked that numpy.isinf(df).values.sum() returns zero.
On Wed, Aug 10, 2022 at 6:01 PM Alexander Andreev @.***> wrote:
I run regressor on synthetic data with 7.5M x 64 shape and got no error. @dklein0 https://github.com/dklein0, can you share the origin of your data? Does it have NaN/Inf values?
— Reply to this email directly, view it on GitHub https://github.com/intel/scikit-learn-intelex/issues/960#issuecomment-1210801052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3UHZCJC5SKLW7HA7L5A2DVYO76BANCNFSM5NLLHNFA . You are receiving this because you were mentioned.Message ID: @.***>
this would be fixed with adding missing values support - pr on oneDAL side for this - https://github.com/oneapi-src/oneDAL/pull/2345
Can somebody tell me how can I resolve this issue? I have tried looking at my data and find nulls. I can't resolve this issue. This is just a help post. Any help to guide me to the correct approach is appreciated.