ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

TypeError: getaddrinfo() argument 1 must be string or None #312

Open ajayrgb opened 4 months ago

ajayrgb commented 4 months ago

Problem

xgboost changed the order of the RabitTracker constructor parameters in 2.1.0

In 2.0.3, host_ip comes first In 2.1.0, host_ip is second.

This breaks the call here.

Steps to reproduce

  1. Create a new venv. Tested with python 3.10.13

  2. Install packages

    pip install xgboost_ray==0.1.19 xgboost==2.1.0 scikit-learn ray[train]
  3. Run example:

    
    from xgboost_ray import RayDMatrix, RayParams, train
    from sklearn.datasets import load_breast_cancer

train_x, train_y = load_breast_cancer(return_X_y=True) train_set = RayDMatrix(train_x, train_y)

evals_result = {} bst = train( { "objective": "binary:logistic", "eval_metric": ["logloss", "error"], }, train_set, evals_result=evals_result, evals=[(train_set, "train")], verbose_eval=False, ray_params=RayParams( num_actors=2, # Number of remote actors cpus_per_actor=1))

bst.save_model("model.xgb") print("Final training error: {:.4f}".format( evals_result["train"]["error"][-1]))


## Error

2024-07-09 15:35:11,003 INFO main.py:1191 -- [RayXGBoost] Starting XGBoost training. Traceback (most recent call last): File "/home/jovyan/run.py", line 10, in bst = train( File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1612, in train bst, train_evals_result, train_additional_results = _train( File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1194, in _train rabit_process, rabit_args = _start_rabit_tracker(alive_actors) File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 261, in _start_rabit_tracker rabit_tracker = _RabitTracker(host, num_workers) File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 64, in init get_family(host_ip) # use python socket to stop early for invalid address File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 14, in get_family return socket.getaddrinfo(addr, None)[0][0] File "/opt/conda/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): TypeError: getaddrinfo() argument 1 must be string or None


## Proposed solution

Pin the xgboost dependency to `<2.1.0`

OR

change [this](https://github.com/ray-project/xgboost_ray/blob/v0.1.19/xgboost_ray/main.py#L261) line to

rabit_tracker = _RabitTracker(host_ip=host, n_workers=num_workers)

thedatamonk commented 4 months ago

@ajayrgb - thanks for posting this solution. I was facing same issue since that last few days but couldn't figure out anything. Finally this worked.

balladbang commented 4 months ago

thanks, roll-back xgboost to 2.0.3 is useful for me

alexanderhanboli commented 3 weeks ago

Same issue here. I have to roll back xgboost to 2.0.3.