Open ottobricks opened 2 years ago
When I try to pass a Ray objectRef to AutoML's fit
, I get an error that either a Numpy array, Pandas DataFrame or Scipy sparse matrix is expected.
@ottok92 How do you perform training currently? If you have a working training function already, you can use flaml.tune
to perform hyperparameter tuning.
Thank you for the quick reply, @sonichi. Currently I do everything in Spark with Scala. I'm interested in using FLAML both because of the impressive CFO algorithm and also to make it easier for my colleagues to collaborate (everybody knows Python). I'll go through the doc you suggested to see if this will enable us to run FLAML on our large datasets. Thanks for the support!
It looks very promising to integrate with Ray's object store. Thank you for the suggestion. I will run some experiments and post feedback in this thread for future reference.
@ottok92 That's great. I'm very interested in how it works with your use case.
Another question is what learner do you use, for example, lightgbm? flaml has built-in search space for the built-in learners, which might be useful. For example, here is an example of tuning lgbm:
https://github.com/microsoft/FLAML/blob/main/test/tune_example.py
To make it work for your dataset, you can modify the train_lgbm
function, metric
, mode
, time_budget_s
, and set use_ray=True
if you would like to do parallel tuning.
Perfect! I'm working with XGBoost, which is also built-in. Once I finish playing with this, I will share my train_xgboost
function. Maybe we can create a section in the Docs for "handling large datasets with Ray".
Perfect! I'm working with XGBoost, which is also built-in. Once I finish playing with this, I will share my
train_xgboost
function. Maybe we can create a section in the Docs for "handling large datasets with Ray".
That'll be super cool. Looking forward to it.
BTW, flaml provides two search spaces for XGBoost. XGBoostSklearnEstimator tunes "max_leaves", and XGBoostLimitDepthEstimator tunes "max_depth".
Hi there,
I have a large dataset (+100GB) that I have been trying to make FLAML (AutoML) work with, no success so far. Since FLAML uses Ray, shouldn't it take advantage of Ray's object store (object spill to disk)? If not, any suggestions on how we should go about out-of-memory compute with FLAML?