sol-eng / bike_predict

A demo of an end-to-end machine learning pipeline, using Posit Connect
93 stars 31 forks source link

Integrate Vetiver, Quarto, and refactor #15

Closed SamEdwardes closed 2 years ago

SamEdwardes commented 2 years ago

This PR is a major update to the bikeshare example. Closes https://github.com/rstudio/sol-eng/issues/114. Changes include:

Refactoring and reorganizing the code

I have refactored the app to be more organized. Major changes include:

Integration of vetiver

The model is now deployed to connect using vetiver. When making API calls vetiver is also used.

SamEdwardes commented 2 years ago

@gsingh91 and @xuf12 this PR is ready for review.

SamEdwardes commented 2 years ago

Updates finished. Ready for review next review @gsingh91.

SamEdwardes commented 2 years ago

Thanks for the review @akgold, I will work on the updates.

One thing that I was thinking, should we move BikeHelpR to its own repo so that it is easier to maintain and then we can point Colorado RSPM (and public package manager) to that repo instead of a subdirectory in this repo?

It could live here: https://github.com/sol-eng/bikeHelpR

akgold commented 2 years ago

Makes sense to me! On May 24, 2022, 11:57 -0400, Sam Edwardes @.***>, wrote:

Thanks for the review @akgold, I will work on the updates. One thing that I was thinking, should we move BikeHelpR to its own repo so that it is easier to maintain and then we can point Colorado RSPM (and public package manager) to that repo instead of a subdirectory in this repo? It could live here: https://github.com/sol-eng/bikeHelpR — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

SamEdwardes commented 2 years ago

PR is ready to be merged. The following updates were made:

jthomasmock commented 2 years ago

Howdy @SamEdwardes - while I'm very excited about Quarto + bikeshare, I don't think the current content items came across as a Quarto files.

I saw that the files still "looked" like RMarkdown as opposed to the new styling in Quarto, so I pulled the content details for the ETL Step 1

library(httr)

apiKey <- Sys.getenv("CONNECT_API_KEY")

result <- GET(glue::glue("{Sys.getenv('CONNECT_SERVER')}/__api__/v1/experimental/content/e1e47cc0-5c31-41bd-a1f4-5729b2ece52e"),
              add_headers(Authorization = paste("Key", apiKey)))
output <- content(result)

# came across as rmd-static instead of quarto-static
output$title
#> [1] "Bike Predict - ETL Step 1 - Raw Data Refresh"
output$app_mode
#> [1] "rmd-static"
# no quarto used in bundle
output$quarto_version
#> NULL

It should have a quarto_version printed ie:

r_version: "3.5.1",
py_version: "3.8.2",
quarto_version: "0.2.22", # shouldn't be NULL
run_as: "rstudio-connect",
run_as_current_user: false,
owner_guid: "25438b83-ea6d-4839-ae8e-53c52ac5f9ce",
url: "http://rstudio-connect.company.com/content/42",
role: "owner"
},

Potential fix

To resolve, you will probably need to use something like:

# quarto R package has a helper function for quarto_path, but could
# report it manually from terminal w/ `$ which quarto`
rsconnect::writeManifest(appPrimaryDoc = "2022-05-26.qmd", quarto = quarto::quarto_path())

Which should generate a manifest file that contains Quarto:

{
  "version": 1,
  "locale": "en_US",
  "platform": "4.2.0",
  "metadata": {
    "appmode": "quarto-static",
    "primary_rmd": "2022-05-26.qmd",
    "primary_html": null,
    "content_category": null,
    "has_parameters": false
  },
  "quarto": {
    "version": "0.9.387",
    "engines": ["knitr"]
  },
  "packages": {

as opposed to the current manifest file:

{
  "version": 1,
  "locale": "en_US",
  "platform": "4.1.2",
  "metadata": {
    "appmode": "rmd-static",
    "primary_rmd": "document.qmd",
    "primary_html": null,
    "content_category": null,
    "has_parameters": false
  },
  "packages": {

Quarto + Colorado

All that being said, I still don't think that Colorado is ready for Quarto items as the RSC 2022.05.0 release had a bug fix for Quarto + K8S on RSC. So it may be fine to "leave as is" given that Quarto content won't be publishable until Cole updates the Colorado RSC server to 2022.05.0.

I'm getting errors like the below still whether deploying from rsconnect or push-button,

note the specific to K8S errors that was fixed in 2022.05.0:

rsconnect::deployDoc("2022-05-26.qmd", quarto = quarto::quarto_path(), server = "colorado.rstudio.com")
Discovering document dependencies... OK
Preparing to deploy document...DONE
Uploading bundle for document: 11476...DONE
Deploying bundle: 56960 for document: 11476 ...
[Connect] Building Quarto document...
[Connect] Bundle created with R version 4.2.0 and Quarto version 0.9.387 is compatible with environment Kubernetes::ghcr.io/rstudio/content-pro:r4.1.3-py3.10.4-bionic with R version 4.1.3 from /opt/R/4.1.3/bin/R and Quarto version 0.9.344 from /opt/quarto/bin/quarto 
[Connect] Bundle requested R version 4.2.0; using /opt/R/4.1.3/bin/R from Kubernetes::ghcr.io/rstudio/content-pro:r4.1.3-py3.10.4-bionic which has version 4.1.3
[Connect] Determining session server location ...
[Connect] Connecting to session server http://172.19.16.23:30285 ...
[Connect] Connected to session server http://172.19.16.23:30285
[Connect] 2022/05/27 03:02:42.967646992 whoami: cannot find name for user ID 999
[Connect] 2022/05/27 03:02:42.969423329 Warning message:
[Connect] 2022/05/27 03:02:42.969431955 In system("whoami", intern = TRUE) : running command 'whoami' had status 1
.... MORE STUFF

.... THEN
[Connect] Using environment Kubernetes::ghcr.io/rstudio/content-pro:r4.1.3-py3.10.4-bionic
[Connect] Using /opt/quarto/bin/quarto with version 0.9.344 from Kubernetes::ghcr.io/rstudio/content-pro:r4.1.3-py3.10.4-bionic
[Connect] Using /opt/R/4.1.3/bin/R with version 4.1.3 from Kubernetes::ghcr.io/rstudio/content-pro:r4.1.3-py3.10.4-bionic
[Connect] Determining session server location ...
[Connect] Connecting to session server http://172.19.16.23:32751 ...
[Connect] 2022/05/27 03:05:54.849767132 [rsc-quarto] Error: cannot lookup current user: user: unknown userid 999
[Connect] Unable to render the deployed content: Rendering exited abnormally: nonzero exit status: 1
[Connect] Stopped connection attempts to session server http://172.19.16.23:32751
Document deployment failed with error: Rendering exited abnormally: nonzero exit status: 1
SamEdwardes commented 2 years ago

@jthomasmock thank for the very helpful right up here! Indeed your initial guess was right, we are not publishing quarto. I incorrect assumed that since it was a .qmd file that writeManifest was correctly determining it to be a quarto doc. I did not inspect manifest.json to see that it was actually static rmd!

{
  "version": 1,
  "locale": "en_US",
  "platform": "4.1.2",
  "metadata": {
    "appmode": "rmd-static",
    "primary_rmd": "document.qmd",
    "primary_html": null,
    "content_category": null,
    "has_parameters": false
  },
  "packages": {

Colorado

I double checked, and I came across the same errors as you when attempting a git backed deploy to Colorado.

2022/05/27 13:27:27.247818736 [rsc-quarto] Error: cannot lookup current user: user: unknown userid 999

So I agree with you suggestion we should leave as is until we get the 2022.05.0 update.