Closed tomsing1 closed 2 years ago
Looks like you are right @tomsing1! Thanks for the report. 👍
Thanks for the report--I wonder where in the process the error occurs.
To my knowledge a space is not an illegal character in S3 (see this doc; note that it can be weird in browser urls though!). I wonder if this is a bug in the underlying s3 library R pins is using?
Naming a pin "mtcars pin"
works in Python pins:
# load these env variables: https://github.com/rstudio/pins-python/blob/main/.env.dev#L11-L13
from dotenv import load_dotenv
load_dotenv()
# connect to a temporary s3 board
from pins.tests.helpers import BoardBuilder
bb = BoardBuilder("s3")
board = bb.create_tmp_board()
from pins.data import mtcars
board.pin_write(mtcars, "mtcars pin", type="csv")
board.pin_meta("mtcars pin")
Here's the pin shown in the web UI
~Hmmmm, I think it is paws.~ Previously I was confusing the bucket and the key.
paws handles spaces in the key just fine:
library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
svc <- s3()
readr::write_lines("My name is Julia", "julia.txt")
my_bucket <- glue::glue("sagemaker-", ids::adjective_animal(style = "kebab"))
svc$create_bucket(
Bucket = my_bucket,
CreateBucketConfiguration = list(LocationConstraint = "us-east-2")
)
#> $Location
#> [1] "http://sagemaker-apiologic-lobo.s3.amazonaws.com/"
svc$put_object(
Bucket = my_bucket,
Key = "My key has many spaces",
Body = "julia.txt"
)
#> $Expiration
#> character(0)
#>
#> $ETag
#> [1] "\"cc78575667603286a62436137080e84d\""
#>
#> $ServerSideEncryption
#> character(0)
#>
#> $VersionId
#> character(0)
#>
#> $SSECustomerAlgorithm
#> character(0)
#>
#> $SSECustomerKeyMD5
#> character(0)
#>
#> $SSEKMSKeyId
#> character(0)
#>
#> $SSEKMSEncryptionContext
#> character(0)
#>
#> $BucketKeyEnabled
#> logical(0)
#>
#> $RequestCharged
#> character(0)
Created on 2022-08-10 by the reprex package (v2.0.1)
The problem is happening in pins:::check_pin_exists(board, "my mtcars")
and specifically pins:::s3_file_exists()
:
Look how the space is getting replace by a +
here:
library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
board <- pins::board_s3(
bucket = "sagemaker-rusted-americanquarterhorse",
region = "us-east-2"
)
board$svc$list_objects_v2(
Bucket = board$bucket,
Prefix = paste0(board$prefix, "my mtcars")
)
#> $IsTruncated
#> [1] FALSE
#>
#> $Contents
#> list()
#>
#> $Name
#> [1] "sagemaker-rusted-americanquarterhorse"
#>
#> $Prefix
#> [1] "my+mtcars"
#>
#> $Delimiter
#> character(0)
#>
#> $MaxKeys
#> [1] 1000
#>
#> $CommonPrefixes
#> list()
#>
#> $EncodingType
#> character(0)
#>
#> $KeyCount
#> [1] 0
#>
#> $ContinuationToken
#> character(0)
#>
#> $NextContinuationToken
#> character(0)
#>
#> $StartAfter
#> character(0)
Created on 2022-08-10 by the reprex package (v2.0.1)
This is now fixed in the dev branch of paws! Many thanks to @DyfanJones ❤️
library(pins)
#> Warning: package 'pins' was built under R version 4.0.5
library(paws)
#> Warning: package 'paws' was built under R version 4.0.5
b <- pins::board_s3(
bucket = "sagemaker-rusted-americanquarterhorse",
region = "us-east-2"
)
b %>% pin_write(mtcars, "my mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20220815T155219Z-ce918'
#> Writing to pin 'my mtcars'
b %>% pin_write(mtcars, "my-mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20220815T155219Z-ce918'
#> Writing to pin 'my-mtcars'
b %>% pin_list()
#> [1] "my mtcars" "my-mtcars"
Created on 2022-08-15 by the reprex package (v2.0.1)
For now, you will need to install the branch with the fix:
remotes::install_github(repo = "DyfanJones/paws/paws.common/", ref = "s3-prefix-space")
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I think there might be a regression bug: The fix to issue #332 ensured that any of the following characters were illegal for the
name
argument when using a cloud board (e.g. S3):[&@:,$=+?; \\^`><{}\\[#%~|]
by calling the (legacy)datatxt_invalid_name
function: https://github.com/rstudio/pins/blob/c7af29f95725d229264e13366fb20898acf43f86/R/legacy_datatxt.R#L396In the new API, the new
pin_store.pins_board_s3
function checks thename
argument by calling thecheck_name
function instead, which is more permissive and only detects slashes. This allowsnames
with other illegal characters (e.g. spaces) to be used.https://github.com/rstudio/pins/blob/baaa304f876ec9cc98cc90d3333db157e34028f7/R/board_s3.R#L213
Illegal characters break some of the functionality, e.g. the
pin_meta
call for an S3 board (because thepins:::s3_file_exists
function doesn't detect the object).Here is a reproducible example (please substitute a suitable S3 bucket):
This fails
This works
Perhaps the
datatxt_invalid_name
function should be revisited?SessionInfo