Closed wlandau closed 2 years ago
ref #720
Is it something to do with the endpoints string mentioned at https://paws-r.github.io/docs/s3/?
I'm not sure this will be less work as there will be many edge cases but paws is using a similar approach as the Google discovery api to generate the endpoints that are exposed in gargle via the build functions that I guess would be involved https://gargle.r-lib.org/reference/request_develop.html
For user facing functions I favour putting sugar on top to make them easier to use, but it would be great to have a universal cloud bucket package. It was discussed previously in cloudyr but didn't turn into anything.
Yes, you can use Google Cloud Storage through Paws using an endpoint AND you must create an access key in Google Cloud Storage -> Settings -> Interoperability. I think that is probably one limitation that a native package would not require.
Connecting to Google Cloud Storage will look like the following, which I just tested successfully. I can't say what the edge cases will be however.
gcs <- paws::s3(config = list(
endpoint = "https://storage.googleapis.com",
region = "auto",
credentials = list(
creds = list(
access_key_id = "GOOGABCDEFGHIJKLMNOP",
secret_access_key = "abcdefghijklmnopqrstuvwxyz"
)
)
))
gcs$list_buckets()
In addition, unfortunately Microsoft Azure does not support the S3 API at all, so that is not an option sadly.
Thanks Mark and David, very helpful to know the level at which S3 on GCP is and is not magic. In your opinion, to what degree are cloud services converging on S3 as a common standard? If I make S3 the only cloud storage in targets
, how likely is that limitation to resolve itself in the long run?
I think the same abstractions will likely work for any of the cloud blob storage providers, so I think it's safe to plan for S3 (+ Google Cloud Storage) now. But I think Azure will eventually take work to support natively, behind a future S3-to-Azure translation layer. As far as I can tell Microsoft doesn't plan to support the S3 API.
My evidence for that is that 1) Google Cloud Storage already supports the S3 API, 3) while Azure Blob Storage does not support the S3 API, people have used a software proxy to communicate with their Azure buckets using the S3 API, so it must be possible to translate the S3 API's operations into equivalent Azure operations.
paws::s3()
on GCP almost works, except I cannot get version IDs for objects in version-enabled buckets. Example HEAD
output:
$Location
[1] "http://storage.googleapis.com/targets-test-bucket-aaaabbbbcccc/x"
$Bucket
[1] "targets-test-bucket-aaaabbbbcccc"
$Key
[1] "x"
$Expiration
character(0)
$ETag
[1] "\"1b7b109a0572ae5c55551f673d3417c7-1\""
$ServerSideEncryption
character(0)
$VersionId
character(0)
$SSEKMSKeyId
character(0)
$BucketKeyEnabled
logical(0)
$RequestCharged
character(0)
But the "generation" ID is somewhere in the object metadata, right? @davidkretch, is there a way to tell paws
to return all the object metadata and not just the metadata that the package thinks is relevant to AWS?
Still interested in discussing a solution, but I am closing this page as an issue because it seems outside the control of targets
. With #803, it should be easier to add GCP as a special case using @MarkEdmondson1234's utility functions.
@davidkretch, you mentioned today that
paws
could support the S3 protocol on other cloud platforms like GCP. Would you or @adambanker be willing to walk me through that? It would really help keeptargets
down to a maintainable size as the number of cloud platforms increases.targets
currently usespaws
to manage AWS S3 data using these basic utility functions, the most aspects of which are version IDs and multipart uploads. If this same code can work on e.g. GCP, that would be amazing. I am willing to refactor it to let the user supply apaws::s3()
object.@MarkEdmondson1234, I apologize if this invalidates your PRs #722 and #748.
(Note to self: if this works out, I should rename "aws" to "s3" in the code base, functions, and arguments, with smooth deprecation of course.)