ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

Xgboost GPU ignores USE_SPREAD_STRATEGY=0 #202

Closed ZhijieWang closed 2 years ago

ZhijieWang commented 2 years ago

in cluster env, set USE_SPREAD_STRATEGY=0. Run xgboost train with GPU, i can still see these being printed and indicates to me there is a placement group with spread strategy.

Demands:
 {'GPU': 1.0, 'CPU': 2.0} * 4 (SPREAD): 1+ pending placement groups
2022-03-02 13:56:50,578 WARNING resource_demand_scheduler.py:746 -- The autoscaler could not find a node type to satisfy the request: [{'CPU': 2.0, 'GPU': 1.0}, {'GPU': 1.0, 'CPU': 2.0}, {'GPU': 1.0, 'CPU': 2.0}]. If this request is related to placement groups the resource request will resolve itself, otherwise please specify a node type with the necessary resource https://docs.ray.io/en/master/cluster/autoscaling.html#multiple-node-type-autoscaling.
2022-03-02 13:56:50,578 WARNING resource_demand_scheduler.py:746 -- The autoscaler could not find a node type to satisfy the request: [{'CPU': 2.0, 'GPU': 1.0}, {'GPU': 1.0, 'CPU': 2.0}, {'GPU': 1.0, 'CPU': 2.0}]. If this request is related to placement groups the resource request will resolve itself, otherwise please specify a node type with the necessary resource https://docs.ray.io/en/master/cluster/autoscaling.html#multiple-node-type-autoscaling.
cadedaniel commented 2 years ago

cc @xwjiang2010

Yard1 commented 2 years ago

@ZhijieWang it's RXGB_USE_SPREAD_STRATEGY - the XGBoost section in the Ray Docs is a bit outdated and in the process of being updated

Yard1 commented 2 years ago

Closed by https://github.com/ray-project/ray/pull/22783