H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Currently, data frames can be split by random splits.
I would like the following types of splits:
Row based data frame split/reweighting
a. by column (i.e. Split data frame by where column matches a criteria (=,>,>=,<,<=)
b. able to determine weights of the split (either set to 0 to filter or sampling parameters)
Column based
a. remove uneeded columns from the data frame
Use cases:
I am using hadoop and I want to export predictions. I only want row that where p1 > .10, also columns, x,y,z + the prediction data. As I type this, I waiting 4 hours to export a large dataset using a 220 node cluster.
Currently, data frames can be split by random splits.
I would like the following types of splits:
Use cases: