ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

fix detecting dask dataframe for distributed data load #196

Closed rokrokss closed 2 years ago

rokrokss commented 2 years ago

closes https://github.com/ray-project/xgboost_ray/issues/195

rokrokss commented 2 years ago

@Yard1 could you rerun the test? I had a typo on the commit so rebased it

Yard1 commented 2 years ago

Ok @rokrokss this looks good, thanks! I'll just test this out locally and we should be good to merge.

rokrokss commented 2 years ago

@Yard1 Yup, I'm not sure the load_data() in distributed loader works right now, I will check that too

rokrokss commented 2 years ago

@Yard1 when using train function I'm seeing load_data() hanging forever with this setting, sorry that it's hard for me to fix it right now. but I'll try to use this RayDMatrix(ray.data.from_dask(df_train), label="label") for now.

Yard1 commented 2 years ago

Ok, that's what I am seeing too. Let's just close it for now and put it in the backlog. Thanks for taking the initiative!