H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
The dot format needs to be converted to an image/pdf using an external tool (graphviz). This is limiting in some deployments where installing new software is hard/impossible. We would like to find out if we can use [https://github.com/nidi3/graphviz-java|https://github.com/nidi3/graphviz-java] as a reliable substitute of the native graphviz.
The goal of this task is to find out whether graphviz-java is a good substitute (is stable and produces as good results as the native graphviz) and if it has reasonable dependencies in order to be integrated in H2O.
add a new option that would let users produce directly an image of the Tree instead of the dot-format.
6) Compare results of native graphviz and graphviz-java
Suggestions:
Instead of generating the Graph input for graphviz-java using the Java API - a shortcut can be taken by generating the dot file and then parsing it using
{code}MutableGraph g = Parser.read(getClass().getResourceAsStream("/tree.dot"));{code}
graphviz-java supports different engines - please make sure you are actually using the non-native graphviz as a backend engine for your comparison tests
Think out of the box - this Jira is formulated for graphviz - but if you find out there is a better solution that would accomplish the primary goal of generating the image of the tree - don’t be afraid to come up with an alternative solution!
Note: graphviz-java is licenced unser Apache License, Version 2.0 - compatible with H2O
H2O users can train GBM/DRF model and obtain a dot representation of the trees that make up the DRF/GBM tree ensemble: [https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java|https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java]
The dot format needs to be converted to an image/pdf using an external tool (graphviz). This is limiting in some deployments where installing new software is hard/impossible. We would like to find out if we can use [https://github.com/nidi3/graphviz-java|https://github.com/nidi3/graphviz-java] as a reliable substitute of the native graphviz.
The goal of this task is to find out whether graphviz-java is a good substitute (is stable and produces as good results as the native graphviz) and if it has reasonable dependencies in order to be integrated in H2O.
For this POC:
Train a GBM model, quick start links: [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html#quick-start|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html#quick-start]
Export model in MOJO format (use documentation to find out how)
Download H2O-3 source code from GitHub: [https://github.com/h2oai/h2o-3|https://github.com/h2oai/h2o-3] and make sure you can build
h2o-genmodel
module using Gradle.Explore Tree visualization options described on [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#viewing-a-mojo-model|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#viewing-a-mojo-model]
Use PrintMojo (see above^^^) to get a dot-representation of a GBM tree; convert to an image using graphviz
Modify [https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java|https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java] (located in h2o-genmodel module):
Suggestions:
Note: graphviz-java is licenced unser Apache License, Version 2.0 - compatible with H2O