H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
[ ] Simple YARN client that can get resource and queue info from RM and print it out
[ ] Queue math to figure out a command-line that might work. This is just a best guess.
[ ] Add better error message to existing h2o-dev yarn client (without any functional changes) to print info about resource acquisition failure
[ ] Job works when kerberos is in place
[ ] Launch AM
[ ] Handle Ctrl-C shutdown
[ ] Handle H2O clustering and mapper->driver messages
Application Master
[ ] Figure out how to write AM log to proper place
[ ] Check yarn properties too low
[ ] Make container resource request
[ ] AM Web page reachable from RM Web UI (with buttons to look at stdout/stderr logs)
[ ] Launch containers
[ ] Handle launch failure due to lack of resources
[ ] (Steady-state) Handle container failure
[ ] (Steady-state) Handle RM failure (possibly by logging and killing the job)
[ ] (Steady-state) Heartbeat thread
[ ] Handle shutdown
Container
[ ] Figure out how to write container log to proper place
[ ] Figure out where ice_root should go (container local dir)
[ ] Set up EmbeddedH2O object
[ ] Handle H2O clustering mapper->driver messages
[ ] Start H2O
[ ] HDFS works when kerberos is in place
TESTING
[ ] Test on CDH5.2
[ ] Test on CDH5.3
[ ] Test on HDP2.1
[ ] Test on HDP2.2
[ ] Test on MapR3
[ ] Test on MapR4
List of some stuff that needs to be done.
DEVELOPMENT
Yarn Client
[ ] Simple YARN client that can get resource and queue info from RM and print it out [ ] Queue math to figure out a command-line that might work. This is just a best guess. [ ] Add better error message to existing h2o-dev yarn client (without any functional changes) to print info about resource acquisition failure [ ] Job works when kerberos is in place [ ] Launch AM [ ] Handle Ctrl-C shutdown [ ] Handle H2O clustering and mapper->driver messages
Application Master
[ ] Figure out how to write AM log to proper place [ ] Check yarn properties too low [ ] Make container resource request [ ] AM Web page reachable from RM Web UI (with buttons to look at stdout/stderr logs) [ ] Launch containers [ ] Handle launch failure due to lack of resources [ ] (Steady-state) Handle container failure [ ] (Steady-state) Handle RM failure (possibly by logging and killing the job) [ ] (Steady-state) Heartbeat thread [ ] Handle shutdown
Container
[ ] Figure out how to write container log to proper place [ ] Figure out where ice_root should go (container local dir) [ ] Set up EmbeddedH2O object [ ] Handle H2O clustering mapper->driver messages [ ] Start H2O [ ] HDFS works when kerberos is in place
TESTING
[ ] Test on CDH5.2 [ ] Test on CDH5.3 [ ] Test on HDP2.1 [ ] Test on HDP2.2 [ ] Test on MapR3 [ ] Test on MapR4