rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
311 stars 63 forks source link

Illegal characters in a pin's`name` are not detected #630

Closed tomsing1 closed 2 years ago

tomsing1 commented 2 years ago

I think there might be a regression bug: The fix to issue #332 ensured that any of the following characters were illegal for the name argument when using a cloud board (e.g. S3): [&@:,$=+?; \\^`><{}\\[#%~|] by calling the (legacy) datatxt_invalid_name function: https://github.com/rstudio/pins/blob/c7af29f95725d229264e13366fb20898acf43f86/R/legacy_datatxt.R#L396

In the new API, the new pin_store.pins_board_s3 function checks the name argument by calling the check_name function instead, which is more permissive and only detects slashes. This allows names with other illegal characters (e.g. spaces) to be used.

https://github.com/rstudio/pins/blob/baaa304f876ec9cc98cc90d3333db157e34028f7/R/board_s3.R#L213

Illegal characters break some of the functionality, e.g. the pin_meta call for an S3 board (because the pins:::s3_file_exists function doesn't detect the object).

Here is a reproducible example (please substitute a suitable S3 bucket):

library(pins)
bucket <- "your-s3-bucket"
region <- "your-aws-region"

board <- pins::board_s3(bucket = bucket, region = region)

This fails

# using a space in the name argument causes errors
pins::pin_write(board, mtcars, "my mtcars")
Guessing `type = 'rds'`
Creating new version '20220801T020333Z-66143'
Writing to pin 'my mtcars'

pins::pin_list(board)
[1] "my mtcars"                      

pins::pin_meta(board, "my mtcars")
Error in `abort_pin_missing()`:
! Can't find pin called 'my mtcars'
ℹ Use `pin_list()` to see all available pins in this board
Run `rlang::last_error()` to see where the error occurred.

This works

# a name without the space works fine
pins::pin_write(board, mtcars, "my_mtcars")
pins::pin_list(board)
[1]  "my mtcars"  "my_mtcars"
pins::pin_meta(board, "mtcars")
List of 11
 $ file       : chr "my_mtcars.rds"
 $ file_size  : 'fs_bytes' int 1.19K
 $ pin_hash   : chr "66143b23339a01d9"
 $ type       : chr "rds"
 $ title      : chr "my_mtcars: a pinned 32 x 11 data frame"
 $ description: NULL
 $ created    : POSIXct[1:1], format: "2022-07-31 19:10:00"
 $ api_version: num 1
 $ user       : list()
 $ name       : chr "my_mtcars"
 $ local      :List of 3
  ..$ dir    : 'fs_path' chr "~/Library/Caches/pins/your-s3-bucket/my_mtcars/20220801T021020Z-66143"
  ..$ url    : NULL
  ..$ version: chr "20220801T021020Z-66143"

Perhaps the datatxt_invalid_name function should be revisited?

SessionInfo

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] pins_1.0.1

loaded via a namespace (and not attached):
 [1] compiler_4.2.1      BiocManager_1.30.18 magrittr_2.0.3      cli_3.3.0          
 [5] tools_4.2.1         fs_1.5.2            glue_1.6.2          rstudioapi_0.13    
 [9] rappdirs_0.3.3      lifecycle_1.0.1     rlang_1.0.4         renv_0.15.5   
juliasilge commented 2 years ago

Looks like you are right @tomsing1! Thanks for the report. 👍

machow commented 2 years ago

Thanks for the report--I wonder where in the process the error occurs.

To my knowledge a space is not an illegal character in S3 (see this doc; note that it can be weird in browser urls though!). I wonder if this is a bug in the underlying s3 library R pins is using?

Naming a pin "mtcars pin" works in Python pins:

# load these env variables: https://github.com/rstudio/pins-python/blob/main/.env.dev#L11-L13
from dotenv import load_dotenv
load_dotenv()

# connect to a temporary s3 board
from pins.tests.helpers import BoardBuilder
bb = BoardBuilder("s3")
board = bb.create_tmp_board()

from pins.data import mtcars

board.pin_write(mtcars, "mtcars pin", type="csv")
board.pin_meta("mtcars pin")

Here's the pin shown in the web UI

image

juliasilge commented 2 years ago

~Hmmmm, I think it is paws.~ Previously I was confusing the bucket and the key.

paws handles spaces in the key just fine:

library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
svc <- s3()

readr::write_lines("My name is Julia", "julia.txt")
my_bucket <- glue::glue("sagemaker-", ids::adjective_animal(style = "kebab"))

svc$create_bucket(
  Bucket = my_bucket,
  CreateBucketConfiguration = list(LocationConstraint = "us-east-2")
)
#> $Location
#> [1] "http://sagemaker-apiologic-lobo.s3.amazonaws.com/"

svc$put_object(
  Bucket = my_bucket,
  Key = "My key has many spaces",
  Body = "julia.txt"
)
#> $Expiration
#> character(0)
#> 
#> $ETag
#> [1] "\"cc78575667603286a62436137080e84d\""
#> 
#> $ServerSideEncryption
#> character(0)
#> 
#> $VersionId
#> character(0)
#> 
#> $SSECustomerAlgorithm
#> character(0)
#> 
#> $SSECustomerKeyMD5
#> character(0)
#> 
#> $SSEKMSKeyId
#> character(0)
#> 
#> $SSEKMSEncryptionContext
#> character(0)
#> 
#> $BucketKeyEnabled
#> logical(0)
#> 
#> $RequestCharged
#> character(0)

Created on 2022-08-10 by the reprex package (v2.0.1)

juliasilge commented 2 years ago

The problem is happening in pins:::check_pin_exists(board, "my mtcars") and specifically pins:::s3_file_exists():

https://github.com/rstudio/pins-r/blob/3dffed939772df6367ce29b51603db023c0a7da9/R/board_s3.R#L294-L300

Look how the space is getting replace by a + here:

library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
board <- pins::board_s3(
  bucket = "sagemaker-rusted-americanquarterhorse", 
  region = "us-east-2"
  )

board$svc$list_objects_v2(
  Bucket = board$bucket,
  Prefix = paste0(board$prefix, "my mtcars")
)
#> $IsTruncated
#> [1] FALSE
#> 
#> $Contents
#> list()
#> 
#> $Name
#> [1] "sagemaker-rusted-americanquarterhorse"
#> 
#> $Prefix
#> [1] "my+mtcars"
#> 
#> $Delimiter
#> character(0)
#> 
#> $MaxKeys
#> [1] 1000
#> 
#> $CommonPrefixes
#> list()
#> 
#> $EncodingType
#> character(0)
#> 
#> $KeyCount
#> [1] 0
#> 
#> $ContinuationToken
#> character(0)
#> 
#> $NextContinuationToken
#> character(0)
#> 
#> $StartAfter
#> character(0)

Created on 2022-08-10 by the reprex package (v2.0.1)

juliasilge commented 2 years ago

This is now fixed in the dev branch of paws! Many thanks to @DyfanJones ❤️

library(pins)
#> Warning: package 'pins' was built under R version 4.0.5
library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
b <- pins::board_s3(
  bucket = "sagemaker-rusted-americanquarterhorse", 
  region = "us-east-2"
  )

b %>% pin_write(mtcars, "my mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20220815T155219Z-ce918'
#> Writing to pin 'my mtcars'
b %>% pin_write(mtcars, "my-mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20220815T155219Z-ce918'
#> Writing to pin 'my-mtcars'
b %>% pin_list()
#> [1] "my mtcars" "my-mtcars"

Created on 2022-08-15 by the reprex package (v2.0.1)

For now, you will need to install the branch with the fix:

remotes::install_github(repo = "DyfanJones/paws/paws.common/", ref = "s3-prefix-space")
github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.