ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

Add zero-copy DMatrix creation with Arrow #224

Open Yard1 opened 2 years ago

Yard1 commented 2 years ago

We are currently converting to Pandas before initialising the DMatrix. We should consider using Arrow instead to avoid unnecessary copies. XGBoost has Arrow support - https://github.com/dmlc/xgboost/pull/7512

natmod commented 2 years ago

Thanks for adding this! It looked like the changes were split over two PRs, just FYI here is the second https://github.com/dmlc/xgboost/pull/7283

tonyabracadabra commented 2 years ago

And supporting polars dataframe for creating DMatrix in Python?