Open prutskov opened 3 years ago
@Yard1 or @krfricke can you take a look at this?
I believe that you need to ensure that not all CPUs are occupied by xgboost ray actors so that modin actors can do their job. In the examples we do it through RayParams. I'll check if this is what needs to be done here.
@prutskov The issue is that Modin also uses Ray underneath - which means it uses the same actor pool. By default, XGBoost-Ray will schedule as many actors as you have cores available, which means no cores will be left for Modin. The fix is very simple:
bst = xgb.train(
params,
dmatrix,
num_boost_round=50,
verbose_eval=10,
evals=[(dmatrix, "train")],
evals_result=evals_result,
ray_params=xgb.RayParams(num_actors=7)) # change to the number of cores you have minus 1
By explicitly telling XGBoost-Ray to spawn one less actors than you have CPU cores, Modin will be able to use the free core for its operations. Hope that helps!
Actors claims all cluster resources. In the result warning is happened when use Modin:
Warning message, after that execution hangs:
Script to reproduce:
It uses HIGGS dataset.
Packages versions: Ray == 1.4 Modin == master xgboost_ray ==master
cc @krfricke