h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.86k stars 2k forks source link

POC graphviz-java for use with Tree visualization in MOJO #8865

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

H2O users can train GBM/DRF model and obtain a dot representation of the trees that make up the DRF/GBM tree ensemble: [https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java|https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java]

The dot format needs to be converted to an image/pdf using an external tool (graphviz). This is limiting in some deployments where installing new software is hard/impossible. We would like to find out if we can use [https://github.com/nidi3/graphviz-java|https://github.com/nidi3/graphviz-java] as a reliable substitute of the native graphviz.

The goal of this task is to find out whether graphviz-java is a good substitute (is stable and produces as good results as the native graphviz) and if it has reasonable dependencies in order to be integrated in H2O.

For this POC:

Train a GBM model, quick start links: [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html#quick-start|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm.html#quick-start]

Export model in MOJO format (use documentation to find out how)

Download H2O-3 source code from GitHub: [https://github.com/h2oai/h2o-3|https://github.com/h2oai/h2o-3] and make sure you can build h2o-genmodel module using Gradle.

Explore Tree visualization options described on [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#viewing-a-mojo-model|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html#viewing-a-mojo-model]

Use PrintMojo (see above^^^) to get a dot-representation of a GBM tree; convert to an image using graphviz

Modify [https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java|https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/hex/genmodel/tools/PrintMojo.java] (located in h2o-genmodel module):

Suggestions:

Note: graphviz-java is licenced unser Apache License, Version 2.0 - compatible with H2O

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6768 Assignee: New H2O Bugs Reporter: Michal Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/3798