Open ajayrgb opened 4 months ago
@ajayrgb - thanks for posting this solution. I was facing same issue since that last few days but couldn't figure out anything. Finally this worked.
thanks, roll-back xgboost to 2.0.3 is useful for me
Same issue here. I have to roll back xgboost to 2.0.3.
Problem
xgboost changed the order of the RabitTracker constructor parameters in 2.1.0
In 2.0.3, host_ip comes first In 2.1.0, host_ip is second.
This breaks the call here.
Steps to reproduce
Create a new venv. Tested with python 3.10.13
Install packages
Run example:
train_x, train_y = load_breast_cancer(return_X_y=True) train_set = RayDMatrix(train_x, train_y)
evals_result = {} bst = train( { "objective": "binary:logistic", "eval_metric": ["logloss", "error"], }, train_set, evals_result=evals_result, evals=[(train_set, "train")], verbose_eval=False, ray_params=RayParams( num_actors=2, # Number of remote actors cpus_per_actor=1))
bst.save_model("model.xgb") print("Final training error: {:.4f}".format( evals_result["train"]["error"][-1]))
2024-07-09 15:35:11,003 INFO main.py:1191 -- [RayXGBoost] Starting XGBoost training. Traceback (most recent call last): File "/home/jovyan/run.py", line 10, in
bst = train(
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1612, in train
bst, train_evals_result, train_additional_results = _train(
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1194, in _train
rabit_process, rabit_args = _start_rabit_tracker(alive_actors)
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 261, in _start_rabit_tracker
rabit_tracker = _RabitTracker(host, num_workers)
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 64, in init
get_family(host_ip) # use python socket to stop early for invalid address
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 14, in get_family
return socket.getaddrinfo(addr, None)[0][0]
File "/opt/conda/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
TypeError: getaddrinfo() argument 1 must be string or None
rabit_tracker = _RabitTracker(host_ip=host, n_workers=num_workers)