stevenpawley / h2oparsnip

Model Wrappers for H2O models
Other
19 stars 3 forks source link

AutoML H2o #4

Closed Shafi2016 closed 3 years ago

Shafi2016 commented 3 years ago

Thanks for the nice package. Do you have any documentation for this package? Do you have a basic example for running AutoML? I was thinking to combine it with modeltime R.

stevenpawley commented 3 years ago

Unfortunately I haven't gotten that far yet. However, it is mostly the same as using anything in tidymodels/parsnip, apart from you set the engine to "h2o". A conceptual struggle is that H2O is probably best when the data is kept within a H2OFrame, but that doesn't work if you are using other tidymodels features, e.g. recipes, tune etc., which require data to be in the R environment. There is a very rough tune_grid_h2o in the package, which keeps the data within the H2O cluster.

Shafi2016 commented 3 years ago

Probably you can collaborate with H2O's people to improve it further.

mdancho84 commented 3 years ago

I agree with @stevenpawley that it's needed to minimize the data conversion (this is actually very expensive when converting to/from Data Frame / H2O Frame.

The nice thing about H2O AutoML is that it manages the whole process of tuning, so there shouldn't be much hyper param tweaking. If there is, the user can use set_engine() to specify the needed arguments, which would go straight to the h2o::automl() function and be used in the tuning process within H2O.

With that said, the only challenge I see is that (unlike other H2O algorithms) AutoML returns a Leaderboard. This requires a choice on the user's end. Typically my choices are:

An option during the training process would be to store both of these models. Then when the user serializes (saves), the user gets both models. Prediction happens with the Best. Explanation happens with the best explainable.

These are just my thoughts... Would be happy to discuss more as part of #5.