h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Combine two models into one mojo #12630

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{code:java} The second model uses the prediction from the first model as a feature. Get a single mojo for this pipeline {code}

exalate-issue-sync[bot] commented 1 year ago

Ruslan Dautkhanov commented: Would it be possible to use GLRM output -> GBM for example in such a single mojo scoring pipeline? cc [~accountid:557058:1f01b471-f37b-40af-bae9-a18b38e24549] Thank you.

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] Yes, that will be possible. All algos with numeric inputs and outputs will be supported. The only one left out is going to be word2vec (which might be added in the future).

exalate-issue-sync[bot] commented 1 year ago

Ruslan Dautkhanov commented: thank you [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] - that's great!

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: {code} Usage: java [...java args...] hex.genmodel.tools.BuildPipeline --mapping ... --output --input ...

 --mapping Mapping of model predictions to main model inputs.
           Example: Specify 'CLUSTER=clustering:0' to use a model defined in a MOJO file 'clustering.zip'
                    and map the predicted cluster (output 0) to input column 'CLUSTER' of the main model.
 --input   List of input MOJO files representing both the main model and the prerequisite models.
 --output  Name of the generated MOJO pipeline file.

 Input mappings are specified in format '<columnName>=<modelAlias>:<predictionIndex>'.

 Model alias is based on the name of the MOJO file.
 For example, a MOJO stored in 'glm_model.zip' will have the alias 'glm_model'.

Note: There is no need to specify which of the MOJO model represents the main model. The tool automatically identifies the main model as the one that doesn't have any output mappings. {code}

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: Example:

{code} java -cp h2o-genmodel.jar hex.genmodel.tools.BuildPipeline --mapping predict=kmeans:0 --output pipe.zip --input kmeans.zip gbm.zip {code}

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-5775 Assignee: Michal Kurka Reporter: Nidhi Mehta State: Resolved Fix Version: 3.22.0.1 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/2937