Closed caewok closed 3 years ago
Have you set the AWS_DEFAULT_REGION
environment variable? If not, you could try that or set the cloudyr.aws.default_region
option either in a project-level .Rprofile
file or _targets.R
. (Although, the latter might not work with tar_make_future()
or tar_make_clustermq()
if storage = "worker"
because workers would need the option too.)
To be clear, there is currently no way to enter the region into tar_resources_aws()
, so this is more of a limitation than a bug. But maybe targets
should allow different buckets to have different regions via resources.
If I set the AWS_DEFAULT_REGION
environmental variable, nothing changes.
If I set options("cloudyr.aws.default_region" = "")
in a .Rprofile
then get_bucket("LoanModeling")
works (as expected) but tar_make()
still does not work. The error changes, however. Either targets
or an underlying package is trying to pull us-west-002
out of the endpoint and set it as the region. The error is:
$ Message : chr "The specified bucket does not exist: us-west-002"
The issue is the correct endpoint is s3.us-west-002.backblazeb2.com
but tar_make
is using us-west-002.s3.us-west-002.backblazeb2.com
.
If I add options("cloudyr.aws.default_region" = "")
to _targets.R
, I get the same issue as above, with it trying to use us-west-002
as a region.
@caewok, please have a look at #682. The new region
argument to tar_resources_aws()
is now forwarded to aws.s3::object_exists()
and other aws.s3
functions, and you can set tar_resources_aws(region = "")
. Hopefully that solves your issue.
For future reference, most of the code in targets
for interacting with AWS is at https://github.com/ropensci/targets/blob/main/R/class_aws.R and https://github.com/ropensci/targets/blob/main/R/class_aws_file.R.
Thanks for the quick reply! Pulled the new update from GitHub; unfortunately it still does not work. (I confirmed it is v.0.8.1.9000.) When I run tar_make()
, I still get:
• start target data List of 3 chr "NoSuchBucket" $ Message : chr "The specified bucket does not exist: us-west-002" chr "us-west-002"
- attr(, "headers")=List of 6 chr "ab1a04c1c1008704" chr "adQduc2uDbutvLHfobg0=" chr "max-age=0, no-cache, no-store" ..$ content-type : chr "application/xml" chr "212"ent-length : chr "Fri, 05 Nov 2021 00:02:49 GMT" ..- attr(, "class")= chr [1:2] "insensitive" "list" chr "aws_error"s")= chr "PUT\n/LoanModeling/_targets/objects/data\n\nhost:us-west-002.s3.us-west-002.backblazeb2.com\nx-amz-acl:private\"| truncated chr "AWS4-HMAC-SHA256\n20211105T000249Z\n20211105/us-west-002/s3/aws4_request\nfe887c358f886efcdf178c649e94dd86b68b6"| truncated chr "AWS4-HMAC-SHA256 Credential=[AWS_ACCESS_KEY_ID]/20211105/us-west-002/s3/aws4_request,SignedHeaders=host;x"| truncated NULL
The top of my _targets.R
file specifies region should be left blank:
tar_option_set(
resources = tar_resources(
aws = tar_resources_aws(bucket = "LoanModeling", region = "")
)
)
But the error message suggests that something is still overriding region = ""
with region = "us-west-002"
. I don't have a good explanation for why this is happening. It could be that region <- store$resources$aws$region %|||% store$resources$region
in store_produce_aws_path()
is doing that. Or, more likely, targets
is passing the bucket and region name to aws.s3
in a manner that is causing aws.s3
to replace the region and constructing the URL incorrectly: "us-west-002.s3.us-west-002.backblazeb2.com" instead of "s3.us-west-002.backblazeb2.com."
As before, get_bucket("LoanModeling")
works because options("cloudyr.aws.default_region" = "")
is set. Putting options("cloudyr.aws.default_region" = "")
into the _targets.R
file still has no effect.
I have a theory on this:
store_upload_object.tar_aws
and related functions call region <- store_aws_region(store$file$path).
But if I understand correctly, the path to be parsed in this case should come from targets:::store_produce_aws_metabucket("LoanModeling", "")
and in that case, will be "bucket=LoanModeling:region="
. So then, if you call targets:::store_aws_region("bucket=LoanModeling:region=")
the result is NULL
, causing the region to revert back to some default like "us-west-002"
instead of ""
. So some part of that chain, probably store_produce_aws_metabucket
, needs to correctly handle when region is set to ""
.
Yeah, that's probably right. Would you have a look at e19624651fde0f86b86a853746f361158b8d780f? I can only create conventional AWS buckets because I do not have a Backblaze subscription.
I tried e196246 but no change. But I think we are on the right track. The next piece of the puzzle appears to be the calls to aws.s3::object_exists
, aws.s3::head_object
, aws.s3::put_object
, and aws.s3::save_object
. Those calls all set check_region = TRUE
while the default is FALSE.
For example, if I run this code, I get back the expect TRUE value:
aws.s3::object_exists(
object = "[Object path in s3 bucket]",
bucket = "LoanModeling",
region = "",
check_region = FALSE
)
While setting check_region
to TRUE
throws a 404 error and returns FALSE:
aws.s3::object_exists(
object = "[Object path in s3 bucket]",
bucket = "LoanModeling",
region = "",
check_region = TRUE
)
check_region
appears to be used by aws.s3::s3HTTPS
. The description for that parameter tells me:
check_region | A logical indicating whether to check the value of region against the apparent bucket region. This is useful for avoiding (often confusing) out-of-region errors. Default is FALSE.
And the region parameter says:
region | A character string containing the AWS region. Ignored if region can be inferred from bucket. If missing, an attempt is made to locate it from credentials. Defaults to “us-east-1” if all else fails. Should be set to "" when using non-AWS endpoints that don't include regions (and base_url must be set).
So I think what is happening is that setting check_region
toTRUE
is overriding the region
parameter. So even if you pass "" correctly to region,
you also have to set check_region
to FALSE
. Otherwise, s3HTTPS
goes searching for a valid region, sees "us-west-002" in the URL, and assumes (incorrectly) that it should be the region.
I assume you have good reasons for wanting to normally set check_region = TRUE.
So perhaps only set it to FALSE
if region is ""?
check_region = TRUE
came from https://github.com/ropensci/targets/issues/400. I think we can set check_region
to TRUE
if and only if region
is NULL
. (IMO aws.s3
should do this already, but maintenance of that package has slowed down.)
Please try 7f724cd79c220baccf7c7fc9aa85f28c5d123985.
Yep, that fixed it! Thanks!
Description
I am using Backblaze S3 compatible bucket. For this to work, two things need to happen:
install.packages("aws.s3","http://rforge.net/",type="source")
.options("cloudyr.aws.default_region" = "")
must be set orregion = ""
must be passed to functions likeget_bucket
.The issue I am running into is that the packages function
tar_resources_aws
is ignoring the cloudyr setting and instead trying to add an erroneous region to the S3 request.Reproducible example
Take the S3 example from the targets documentation. Assume we have a bucket, which I will call here "LoanModeling," and the following
_targets.R
file:With
aws.s3
installed, I can access this Backblaze bucket using either:or
But if I call
tar_make()
for the_targets.R
file, I get an error:Notice how the request appears to be adding an erroneous
us-east-1
region.Diagnostic information