H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
There are some API functions that return new data frames, but give no opportunity to give them a name. They get some random looking name, meaning you are left guessing which is which.
Here is a full use case, for as.factor, at least:
Running 2+ nodes, on EC2, and running rstudio on each node.
I've then started a long-running model on node 1, so rstudio is busy
Open rstudio on node 2, wanting to do something else with train/valid/test.
{code}
train = h2o.getFrame("???")
test = h2o.getFrame("???")
valid = h2o.getFrame("???")
{code}
(In this case I could use Flow to work out which was train, by the number of rows; but test and valid were the same size! I was reduced to guessing, then looking at the original csv files to see if I had guessed correctly.)
BTW, in this case, if the randomly generated name of a copy frame was based on the name of the original frame, I'd have been okay. (importFile() chooses a name based on the csv filename.) It'd be nice to have that feature, too.
There are some API functions that return new data frames, but give no opportunity to give them a name. They get some random looking name, meaning you are left guessing which is which.
Here is a full use case, for as.factor, at least:
(In this case I could use Flow to work out which was train, by the number of rows; but test and valid were the same size! I was reduced to guessing, then looking at the original csv files to see if I had guessed correctly.)
BTW, in this case, if the randomly generated name of a copy frame was based on the name of the original frame, I'd have been okay. (importFile() chooses a name based on the csv filename.) It'd be nice to have that feature, too.