rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Best practices on updating and extracting pins #769

Closed EKtheSage closed 1 year ago

EKtheSage commented 1 year ago

Hello,

I currently run a daily job to send new data to a pin. Each day a new version of pin will be in the model_prediction_output on the board. My question is, how do I get the data of the past 7 days?

board_data %>%
  pin_write(result, 'model_prediction_output',
            type = 'parquet',
            description = 'model prediction',
            tags = c('model'))

I see the pin_versions output, but that doesn't look like an easy way to grab multiple versions of a pin.

board_data |> pin_versions('model_prediction_output')

  version                created             hash 
  <chr>                  <dttm>              <chr>
1 20230823T225549Z-385c7 2023-08-23 22:55:49 385c7

Do you have a recommended way to read in multiple versions of a pin to union them together? So, let's say, I can filter on the pin_versions result df on created, and use the version associated with created to read in the desired pin. But that seems like a hacky way to do it if there were multiple versions of the pin update in a day.

Or could there be a way to upsert a pin so it's doing insertion for new records and updates for existing records, and producing a new version of the pin?

iandarbeynhiu commented 1 year ago

Very quick solution for the last 7 versions

map_df(head(pin_versions(YOUR_BOARD, "YOUR_PIN"),7)$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})

Would give you the last 7 versions as a dataframe.....

For the last 7 days regardless of number of versions...

last_7_days <- filter(pin_versions(YOUR_BOARD, "YOUR_PIN"), created >= Sys.Date()-7)

map_df(last_7_days$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})

Although would suggest not hard coding the board and pin in the function to keep it more general. But this would work.

For the most up to date version on each day limited to the last 7 days......

last_7_days <- pin_versions(YOUR_BOARD, "YOUR_PIN") %>%
  filter(created >= Sys.Date()-7) %>%
  mutate(Date = as_date(created)) %>%
  group_by(Date) %>%
  summarise(version = max(version))

map_df(last_7_days$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})
juliasilge commented 1 year ago

You can also take an approach similar to what is outlined in #758, something like this:

library(tidyverse)
library(pins)
b <- board_connect()
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
pin_name <- "julia.silge/traffic-crash-model-metrics"

last_seven <- b |> 
  pin_versions(pin_name) |> 
  slice_head(n = 7)

last_seven |> 
  mutate(pin_contents = map(version, ~ pin_read(b, pin_name, version = .))) |> 
  unnest(pin_contents)
#> # A tibble: 4,808 × 9
#>    version created             active  size .index        .n .metric  .estimator
#>    <chr>   <dttm>              <lgl>  <dbl> <date>     <int> <chr>    <chr>     
#>  1 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 accuracy binary    
#>  2 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 kap      binary    
#>  3 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 mn_log_… binary    
#>  4 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 roc_auc  binary    
#>  5 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 accuracy binary    
#>  6 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 kap      binary    
#>  7 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 mn_log_… binary    
#>  8 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 roc_auc  binary    
#>  9 78815   2023-08-19 19:08:00 TRUE   28815 2020-12-06  1695 accuracy binary    
#> 10 78815   2023-08-19 19:08:00 TRUE   28815 2020-12-06  1695 kap      binary    
#> # ℹ 4,798 more rows
#> # ℹ 1 more variable: .estimate <dbl>

Created on 2023-08-24 with reprex v2.0.2

In these results, the columns version through size are from the version metadata and the columns .index through .estimate are from the pin contents.

EKtheSage commented 1 year ago

Thanks! This is super helpful! I think this issue is similar enough to #758 so I'll just close this one.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.