rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

board_azure cannot read old pins #527

Closed raisin-toast closed 2 years ago

raisin-toast commented 2 years ago

We store our pins on azure and using the new API it seems as though we cannot read the pins that were created in the old version. We can access the pins container without a problem (using AzureStor::blob_container . We can also write and read in a newly created pin.

However, when we try to read an old pin using board_azure(), we get an error saying Error: Can't find version 'data.txt'. Supplying the pins version to the function, using the version argument seems to make it work. However, we can't access the versions of the pins using pin_versions - it returns the following tibble:

# A tibble: 4 x 3
  version   created hash 
  <chr>     <dttm>  <chr>
1 _versions NA      NA   
2 data.csv  NA      NA   
3 data.rds  NA      NA   
4 data.txt  NA      NA 

To add to the complication, using the legacy api function board_register_azure() now returns a 403 error. We use an access key to authenticate.

Any advice on what to do will be greatly appreciated.

hongooi73 commented 2 years ago

Is your old board versioned?

hongooi73 commented 2 years ago

Also, is there any additional info printed about the error you're getting? There is a full list of error codes here:

https://docs.microsoft.com/en-us/rest/api/storageservices/sas-error-codes

raisin-toast commented 2 years ago

Yes our old board was versioned.

And the error we get is Client error: (403) Forbidden. Failed to download remote file.

hongooi73 commented 2 years ago

Ah I misread, I thought you were using a SAS. Best practice is never to use the access key, although it looks like it's unavoidable in this case.

Your admin has probably rotated the key for the account (doing this at regular intervals is also best practice). You'll have to ask them for the new key.

raisin-toast commented 2 years ago

I don't think that is the case as other parts of our techstack that use the key are working. I have also spoken to admin and they haven't changed the key.

hongooi73 commented 2 years ago

Can you share how your board was created? I can't figure out how to make a versioned board in Azure using the old pins package.

hongooi73 commented 2 years ago

Nvm, I can reproduce the error with a non-versioned board.

ETA: I can reproduce the 403 error with board_register_azure followed by pin_get. If I use legacy_azure, I get a different error:

Error in path_tidy(.Call(fs_path_, lapply(args, function(x) enc2utf8(as.character(x))), 
 :
  Total path length must be less than PATH_MAX: 260
raisin-toast commented 2 years ago

To setup with versions, we set up the container manually on portal.azure.com and then we register the board with this code:

pins::board_register_azure(container = "container_name",
                           account = "account_name",
                           key = access_key,
                           versions = TRUE)

When I try to use legacy_azure I get this error Error in !cache : invalid argument type. This is the same whether I leave the cache argument as null or as pins::board_cache_path("azure")

hadley commented 2 years ago

@raisin-toast this is by design — legacy boards are not compatible with the modern boards (except for board_rsconnect())

hongooi73 commented 2 years ago

@hadley I think there is an actual bug here: if I use legacy_azure to use a board created by pins 0.4.4, followed by pin_get, I get that error shown above. Or are you saying that legacy boards aren't accessible at all?

hadley commented 2 years ago

That should work — do you have a reprex?

hongooi73 commented 2 years ago

Reproducing is tricky again because of auth details, but basically:

  1. Create a board using the old pins:
install.packages("pins", repos="https://mran.microsoft.com/snapshot/2021-01-01")
library(pins)

bd <- board_register_azure("myboard", "containername", "accountname", key="xyz")
pin(mtcars, "mtcars", description="mtcars1", board="myboard")
  1. Try to access it using the new pins:
# restart R
library(devtools)
load_all()
oldbd <- legacy_azure("containername", "accountname", key="xyz")
pin_get("mtcars", oldbd)
Error in pin_download_one(p, ...) : 
  Client error: (403) Forbidden. Failed to download remote file: https://accountname.blob.core.windows.net/containername/mtcars

Printing oldbd also gives an error:

r$> oldbd
Pin board <pins_board_datatxt>
Cache size: 8
Error in order(results$name) : argument 1 is not a vector
Session info: ``` r$> sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043) Matrix products: default locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] AzureStor_3.5.0 pins_0.99.9000 testthat_3.0.4 devtools_2.4.2 usethis_2.0.1 loaded via a namespace (and not attached): [1] zip_2.1.1 AzureRMR_2.4.0 pillar_1.4.4 compiler_4.1.1 [5] prettyunits_1.1.1 remotes_2.4.1 tools_4.1.1 digest_0.6.27 [9] pkgbuild_1.2.0 pkgload_1.2.2 bit_1.1-15.2 tibble_3.0.1 [13] jsonlite_1.7.2 memoise_2.0.0 lifecycle_1.0.1 pkgconfig_2.0.3 [17] rlang_0.4.10 AzureAuth_1.3.2 cli_3.0.1 rstudioapi_0.13 [21] filelock_1.0.2 curl_4.3 yaml_2.2.1 fastmap_1.0.1 [25] withr_2.4.2 httr_1.4.2 rappdirs_0.3.3 askpass_1.1 [29] desc_1.4.0 fs_1.5.0 vctrs_0.3.6 rprojroot_1.3-2 [33] bit64_0.9-7 tidyselect_1.1.0 glue_1.4.1 R6_2.4.1 [37] processx_3.5.2 sessioninfo_1.1.1 whisker_0.4 callr_3.7.0 [41] purrr_0.3.4 magrittr_1.5 backports_1.1.8 ps_1.6.0 [45] ellipsis_0.3.1 assertthat_0.2.1 jose_1.0 mime_0.9 [49] arrow_5.0.0 AzureGraph_1.3.1 openssl_1.4.3 cachem_0.0.0.9000 [53] crayon_1.3.4 ```
hongooi73 commented 2 years ago

I note that the URL https://accountname.blob.core.windows.net/containername/mtcars actually points to a (quasi) directory, which is why downloading it fails.

r$> list_storage_files(cont)
             name size isdir  blobtype
1        data.txt   95 FALSE BlockBlob
2          mtcars   NA  TRUE BlockBlob
3 mtcars/data.csv 1336 FALSE BlockBlob
4 mtcars/data.rds 1218 FALSE BlockBlob
5 mtcars/data.txt  273 FALSE BlockBlob

ETA: this is on an account with the hierarchical namespace feature enabled (to support real directories in ADLS2). I'll try later on another account with the feature disabled.

ETA2: confirmed that the error also occurs without HNS

kevin-m-kent commented 2 years ago

I'm seeing the same issue with reading old pinned items with the new functions. I also get a 403 client forbidden error if I try to read azure pins with the legacy functions in the 1.0.0 package.

hadley commented 2 years ago

Reprex for me:

remotes::install_version("pins", "0.4.5")
library(pins)

board_register_azure(
  container = "pins-legacy", 
  account = "pins2", 
  key = Sys.getenv("AZURE_PINS2_LEGACY_KEY")
)
pin(mtcars, "mtcars", board = "azure")

devtools::load_all()
board_register_azure(
  container = "pins-legacy", 
  account = "pins2", 
  key = Sys.getenv("AZURE_PINS2_LEGACY_KEY")
)
pin_get("mtcars", board = "azure")

To get this to work for me, I had to hack azure_headers() to not base64 decode my key, which seems odd.

hadley commented 2 years ago

Notice my error/warning is a little different:

  Client error: (403) Forbidden. Failed to download remote file: https://pins2.blob.core.windows.net/pins-legacy/mtcars/data.rds

It looks like the path is ok, but maybe auth has failed?

The content of the request:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>AuthenticationFailed</Code>
  <Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:c421e58a-a01e-0058-7c67-f08da2000000
Time:2021-12-13T21:20:58.2168311Z</Message>
  <AuthenticationErrorDetail>The Date header in the request is incorrect.</AuthenticationErrorDetail>
</Error>

So quite possibly from https://github.com/rstudio/pins/commit/0ff7cceb747e6c4b63288c3cbd2fcefb48e1f48a

hadley commented 2 years ago

And https://github.com/rstudio/pins/commit/7cde6e72486546a23939f2ed28c1dff0c605dc22 — nope, that was a red herring because I pasted my key in wrong.

hadley commented 2 years ago

Oooh and you can make a reprex without using old pins:

library(pins)
board <- legacy_azure("pins-legacy", "pins2", key = Sys.getenv("AZURE_PINS2_LEGACY_KEY"))
pin("mtcars", board = board)
hadley commented 2 years ago

Should be good now!

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.