Open Colleca opened 11 months ago
That is a great question @Colleca. As of today, you would need to do some customization of your generated Plumber file as well as your Dockerfile, because H2O uses software beyond only R and system libraries.
Your plumber file would look something like this:
library(pins)
library(plumber)
library(rapidoc)
library(vetiver)
library(h2o)
h2o.init()
b <- board_connect() ## or your board
v <- vetiver_pin_read(b, "model-name")
#* @plumber
function(pr) {
pr %>% vetiver_api(v)
}
We do already have support for H2O in bundle so we should be able to roundtrip the H2O model to/from disk correctly.
I don't think that H2O is supported on Posit Connect in a very straightforward manner right now because of the Java requirement, but you should be able to build a Dockerfile for some deployment targets. I am not quickly finding any good examples so we might need to get help from the H2O team.
Here are some docs to look at for H2O on SageMaker (docs are only Python, no R).
thanks for your comment @juliasilge this seems to be a decent starting point where you bundle the h2o automl and then save to the posit connect and you can pull from the server unbundle it and they both make the same prediction
library(tidymodels) library(recipes) library(agua) library(tidyverse) library(h2o) library(bundle) library(pins)
h2o.init()
data(concrete) set.seed(4595) concrete_split <- initial_split(concrete, strata = compressive_strength) concrete_train <- training(concrete_split) concrete_test <- testing(concrete_split)
auto_spec <- auto_ml() %>% set_engine("h2o", max_runtime_secs = 120, seed = 1) %>% set_mode("regression")
normalized_rec <- recipe(compressive_strength ~ ., data = concrete_train) %>% step_normalize(all_predictors())
auto_wflow <- workflow() %>% add_model(auto_spec) %>% add_recipe(normalized_rec)
auto_fit <- fit(auto_wflow, data = concrete_train)
best_model<-bundle(auto_fit)
model_board <- board_connect()
model_board%>%pin_write(best_model,name = "posit/concrete_h2o",type="rds")
read_in_model<-model_board%>% pin_read("posit/concrete_h2o")%>% unbundle()
model_predictions_local<-predict(auto_fit,concrete_test)
model_predictions_saved<-predict(read_in_model,concrete_test)
identical(model_predictions_local,model_predictions_saved)
i should point out that pinning it with vetiver does produce error so as far as posit connect is concerned it just sees it as an .rds data object not like the awesomeness of the pinned model vetiver object
Can you share the error you get when you try to pin with vetiver, i.e. vetiver_pin_write()
? I can successfully pin the model to Connect:
library(tidymodels)
library(recipes)
library(agua)
#>
#> Attaching package: 'agua'
#> The following object is masked from 'package:workflowsets':
#>
#> rank_results
library(h2o)
#>
#> ----------------------------------------------------------------------
#>
#> Your next step is to start H2O:
#> > h2o.init()
#>
#> For H2O package documentation, ask for help:
#> > ??h2o
#>
#> After starting H2O, you can use the Web UI at http://localhost:54321
#> For more information visit https://docs.h2o.ai
#>
#> ----------------------------------------------------------------------
#>
#> Attaching package: 'h2o'
#> The following objects are masked from 'package:stats':
#>
#> cor, sd, var
#> The following objects are masked from 'package:base':
#>
#> &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
#> colnames<-, ifelse, is.character, is.factor, is.numeric, log,
#> log10, log1p, log2, round, signif, trunc
library(vetiver)
#>
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#>
#> load_pkgs
library(pins)
h2o.init()
#> Connection successful!
#>
#> R is connected to the H2O cluster:
#> H2O cluster uptime: 4 minutes 57 seconds
#> H2O cluster timezone: America/Denver
#> H2O data parsing timezone: UTC
#> H2O cluster version: 3.42.0.2
#> H2O cluster version age: 4 months and 18 days
#> H2O cluster name: H2O_started_from_R_juliasilge_aqp711
#> H2O cluster total nodes: 1
#> H2O cluster total memory: 3.12 GB
#> H2O cluster total cores: 8
#> H2O cluster allowed cores: 8
#> H2O cluster healthy: TRUE
#> H2O Connection ip: localhost
#> H2O Connection port: 54321
#> H2O Connection proxy: NA
#> H2O Internal Security: FALSE
#> R Version: R version 4.3.2 (2023-10-31)
#> Warning in h2o.clusterInfo():
#> Your H2O cluster version is (4 months and 18 days) old. There may be a newer version available.
#> Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
data(concrete)
set.seed(4595)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)
auto_spec <-
auto_ml() |>
set_engine("h2o", max_runtime_secs = 120, seed = 1) |>
set_mode("regression")
normalized_rec <-
recipe(compressive_strength ~ ., data = concrete_train) |>
step_normalize(all_predictors())
auto_wflow <-
workflow() |>
add_model(auto_spec) |>
add_recipe(normalized_rec)
auto_fit <- fit(auto_wflow, data = concrete_train)
v <- vetiver_model(auto_fit, "julia.silge/concrete_h2o")
v
#>
#> ── julia.silge/concrete_h2o ─ <bundled_workflow> model for deployment
#> A h2o regression modeling workflow using 8 features
model_board <- board_connect()
#> Connecting to Posit Connect 2023.10.0 at <https://colorado.posit.co/rsc>
model_board |> vetiver_pin_write(v)
#> Writing to pin 'julia.silge/concrete_h2o'
#>
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start
#> This message is displayed once per session.
Created on 2023-12-13 with reprex v2.0.2
Thanks Julia, i think i resolved the issue by updating my packages. im no longer getting a error just pinning with the vetiver pin write. On resolving that issue Im noticing a new issue of getting different predictions from the in memory version than the pinned version. Not really sure whats going on.
``
library(tidymodels) library(recipes) library(agua) library(tidyverse) library(h2o) library(vetiver) library(tictoc) library(pins)
h2o.init()
data(concrete) set.seed(4595) concrete_split <- initial_split(concrete, strata = compressive_strength) concrete_train <- training(concrete_split) concrete_test <- testing(concrete_split)
auto_spec <- auto_ml() |> set_engine("h2o", max_runtime_secs = 120, seed = 1) |> set_mode("regression")
normalized_rec <- recipe(compressive_strength ~ ., data = concrete_train) |> step_normalize(all_predictors())
auto_wflow <- workflow() |> add_model(auto_spec) |> add_recipe(normalized_rec)
auto_fit <- fit(auto_wflow, data = concrete_train)
v <- vetiver_model(auto_fit, "posit/concrete_h2o")
model_board <- board_connect()
model_board |> vetiver_pin_write(v)
read_in_model<-model_board%>%vetiver_pin_read("posit/concrete_h2o")
model_predictions_local<-predict(auto_fit,concrete_test) #make predictions from in session model
model_predictions_vetiver<-predict(read_in_model,concrete_test) #make predictions from read in model
identical(model_predictions_local,model_predictions_vetiver) #they should be the same.
vetiver_deploy_rsconnect(model_board, "posit/concrete_model",appTitle = "concrete_model")
endpoint <- vetiver_endpoint("theserver/cnct/concrete_model/predict")
apiKey<-"theapikey"
test_ob <- concrete_test[1,] tic() predict(endpoint, test_ob, httr::add_headers(Authorization = paste("Key", apiKey))) toc()
tic() predict(auto_fit,test_ob) toc()
``
I don't see a difference between predictions from the local and pinned versions of the H2O models:
library(tidymodels)
library(recipes)
library(agua)
#>
#> Attaching package: 'agua'
#> The following object is masked from 'package:workflowsets':
#>
#> rank_results
library(h2o)
#>
#> ----------------------------------------------------------------------
#>
#> Your next step is to start H2O:
#> > h2o.init()
#>
#> For H2O package documentation, ask for help:
#> > ??h2o
#>
#> After starting H2O, you can use the Web UI at http://localhost:54321
#> For more information visit https://docs.h2o.ai
#>
#> ----------------------------------------------------------------------
#>
#> Attaching package: 'h2o'
#> The following objects are masked from 'package:stats':
#>
#> cor, sd, var
#> The following objects are masked from 'package:base':
#>
#> &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
#> colnames<-, ifelse, is.character, is.factor, is.numeric, log,
#> log10, log1p, log2, round, signif, trunc
library(vetiver)
#>
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#>
#> load_pkgs
library(pins)
h2o.init()
#> Connection successful!
#>
#> R is connected to the H2O cluster:
#> H2O cluster uptime: 4 minutes 1 seconds
#> H2O cluster timezone: America/Denver
#> H2O data parsing timezone: UTC
#> H2O cluster version: 3.42.0.2
#> H2O cluster version age: 5 months and 10 days
#> H2O cluster name: H2O_started_from_R_juliasilge_eqf117
#> H2O cluster total nodes: 1
#> H2O cluster total memory: 3.41 GB
#> H2O cluster total cores: 8
#> H2O cluster allowed cores: 8
#> H2O cluster healthy: TRUE
#> H2O Connection ip: localhost
#> H2O Connection port: 54321
#> H2O Connection proxy: NA
#> H2O Internal Security: FALSE
#> R Version: R version 4.3.2 (2023-10-31)
#> Warning in h2o.clusterInfo():
#> Your H2O cluster version is (5 months and 10 days) old. There may be a newer version available.
#> Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
data(concrete)
set.seed(4595)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)
auto_spec <-
auto_ml() |>
set_engine("h2o", max_runtime_secs = 120, seed = 1) |>
set_mode("regression")
normalized_rec <-
recipe(compressive_strength ~ ., data = concrete_train) |>
step_normalize(all_predictors())
auto_wflow <-
workflow() |>
add_model(auto_spec) |>
add_recipe(normalized_rec)
auto_fit <- fit(auto_wflow, data = concrete_train)
v1 <- vetiver_model(auto_fit, "julia.silge/concrete_h2o")
v1
#>
#> ── julia.silge/concrete_h2o ─ <bundled_workflow> model for deployment
#> A h2o regression modeling workflow using 8 features
model_board <- board_connect()
#> Connecting to Posit Connect 2023.10.0 at <https://colorado.posit.co/rsc>
model_board |> vetiver_pin_write(v1)
#> Writing to pin 'julia.silge/concrete_h2o'
#>
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start
#> This message is displayed once per session.
v2 <- model_board |> vetiver_pin_read("julia.silge/concrete_h2o")
v2
#>
#> ── julia.silge/concrete_h2o ─ <bundled_workflow> model for deployment
#> A h2o regression modeling workflow using 8 features
preds1 <- predict(v1, concrete_test)
#> | | | 0% | |======================================================================| 100%
preds2 <- predict(v2, concrete_test)
#> | | | 0% | |======================================================================| 100%
identical(preds1, preds2)
#> [1] TRUE
Created on 2024-01-05 with reprex v2.0.2
Could you update your example to use the reprex package? Using reprex makes it easier to see both the input and output, and for us to re-run the code in a local session. Thanks! 🙌
Hello, Im wondering if there is any way to deploy the best performing model from search with agua/h2o package. For example, you might have a workflow that first runs the h2o automl, makes the leaderboard, your code selects the best model from that leaderboard, pins it to a board with vetiver and makes predictions from it?
Please label as feature request if this isnt doable yet, I think it would be really cool!