Open amoeba opened 6 years ago
@sckott I more or less copied the list from the API docs. Does having analogsea
implement all of these seem appropriate?
@thercast Did you have any preference for which methods you wanted to tackle first? I should be able to take a look at this some evening this week and actually be helpful.
I think it makes sense to include all of them
I sat down to get a lay of this and everything looks very reasonable. I wrote a get_object
function to match the Spaces API method GET ${BUCKET}.${REGION}.digitaloceanspaces.com/${OBJECT_KEY}
using aws.s3::get_object
and immediately noticed that aws.s3
didn't reasonably handle error cases.
See this example with an invalid bucket name, purposely run:
> rawToChar(space_get_object("welcome.html", "test-analogseas"))
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Error><Code>NoSuchBucket</Code><BucketName>test-analogseas</BucketName><RequestId>tx000000000000002f5f59e-0059ffb7c0-3f970-nyc3a</RequestId><HostId>3f970-nyc3a-nyc</HostId></Error>"
It returns the raw response and throws away the underlying HTTP request's status code so there appears to be no way to know if this is the file I have stored at welcome.html
or an error.
Looking at the source I can see why. Looking at the open Issues on the repo, I see that the function isn't really written to handle errors see this issue.
If it's the case that the aws.s3
package doesn't handle errors, I wonder if we (1) care and (2) would rather try to stay with using aws.s3
or use crul
to get proper error handling.
Thoughts?
@amoeba thanks for your work on this! I've been AFK due to illness but I want to jump back in to help out. I'll add notes on which features I would like to tackle this week.
Regarding the error handling with aws.s3
: I struggled with how to handle this in my first implementation. We have a lot of functionality already built for us within aws.s3
, and having that convenience for this early implementation of spaces is hard to pass up. But showing just the raw response when the request errors out is a real eye sore compared to how the other analogsea
functions elegantly handle errors via crul
. From my perspective, I could see the long term path being a re-write of the key functionality using crul
to be consistent with the rest of the package and having one less dependency, but I will defer to @sckott for his take.
Thanks for raising the issue with erroring. I definitely value failing well, so I'd strongly prefer to make sure users get useful errors - and DigitalOcean, at least with the Droplet API (and I assume with the spaces API) fails well and correctly making it easy to give users appropriate errors
We currently still use httr
in this package. Main thing keeping httr
around is that it can do oauth whereas crul
cannot, but i think an argument can be made to get rid of oauth if we think most people don't use it but instead use PATs
For now I think it's okay to stick with aws.s3
, but eventually i think we should rework to make sure spaces fxns error well
Sounds good, thanks for weighing in @sckott. If/when the aws.s3
package changes how they handle error cases, this package benefits automatically, so I'm down with filling in this portion of the package with aws.s3
as much as possible.
@thercast Sorry to hear you were out! Looking forward to your notes.
any thoughts on this? anything I can help with?
I'm still down to help PR this addition but wanted to let @thercast take lead since it was his idea. Hopefully he'll pop in here and we can get this done soon?
@thercast any thoughts? if you're too busy, anything @amoeba or I can move forward on?
Hi @sckott and @amoeba, sorry for the delay (an urgent shiny app came up at work and then the kids got sick) but I would like to tackle the listing of bucket contents and deletion of a bucket (these tasks should help shake off the cobwebs of being away for a bit). @amoeba I'd be glad if you'd like to start on the fundamental operations for objects in a space (i.e. downloading, uploading, deleting). Does that sound like a good plan?
Sounds good to me!
First half of the Object calls implemented in https://github.com/sckott/analogsea/pull/146. Went pretty well. Regarding the calls I didn't implement,
Delete an Object
I don't know why my code isn't working (yet). It looks like I'm seeing an issue with passing of the base_url
argument which is only affecting this function. It looks like it should work just fine.
Multi-part operations
aws.s3
implements what appears to be the same multi-part uploading API that Spaces does but they do it as an argument multipart
on put_object
which handles the Begin/Upload/Complete calls. Should each of those Spaces API calls to have their own functions or should we just calls this done?
General questions:
I'll look into the delete operation issue above later this week.
Regarding the PR, I assume you're adding more to it? Or do you want it to be handled as is?
spaces_thing_verb
, so for example change spaces_get_object
to spaces_object_get
so that when users are searching through function names (e.g., tab completion to find functions) you get all the functions for spaces_object*
, or for spaces_acl*
man-roxygen/
as a template file to save lines\code{}
spaces_get_object
)I should've made that more clear, sorry: I'm planning on adding more commits but needed the intermediate review (thanks!). Once I'm done, do you mind a messy PR or would you rather I squash the PR into a single commit once I'm happy?
Re: Your comments:
I think it would be better to change the structure to e.g.
spaces_thing_verb
Sold!
tests
Sounds great.
docs: the repeated parameters in each spaces fxn could go into man-roxygen/ as a template file to save lines
Great, this is new to me so I'll look that up and take a stab at it.
wrap booleans in
\code{}
Good suggestion!
maybe for examples, ...
I like this a lot. Will do.
Yes, I think breaking up spaces fxns into many files is good
Great, will do.
Okay, figured out the Roxygen template functionality to DRY up the common @params
.
xml2
package, and returning an xml_document
if it's installed. If it's not installed, sending a message()
telling the user they can install the xml2
package to get this functionality, and then returning the character
response? Plus an arg or option to disable?Figured out the problem with deleting objects. Totally just due to a bug in aws.s3
, which I've Issue+PR'd https://github.com/cloudyr/aws.s3/issues/189
not sure on the multipart thing. to make separate functions, how many more fxns does that add?
for xml, like idea of putting xml2
in Suggests
in DESCRIPTION
file, then like you say we parse to an xml_document if installed and if not then just the character string.
not sure on the multipart thing. to make separate functions, how many more fxns does that add
It'd be five new functions which have a 1:1 mapping with their Spaces API calls. I think the main reason to do this is so we can claim this package implements the Spaces API with 1:1 mappings between API routes and R functions. I don't think this would be commonly used by this package's users.
For the user to upload a multi-part file, they would have to write their own routine to start, upload each piece, and complete the upload (which seems less-than-fun and error-prone). The functionality that would be provided by the five separate functions is almost entirely wrapped up in aws.s3::put_bucket(..., multipart=TRUE, ...)
except for the "Cancel a Multi-part Upload" call.
I'd lobby for not doing these calls now and adding docs to spaces_object_put
showing how to use the multipart
argument. When/if this package switches to crul
instead of aws.s3
, that would be a good time to write these individual API calls out as functions.
Okay, xml2
support added to spaces_acl_put
and spaces_acl_get
in https://github.com/sckott/analogsea/pull/146/commits/19a98b070145893bd64ed859e352d318b6317c9b
If you're okay with not implementing the multi-part API with 1:1 functions, then the Object API is done.
Oh, and a note on how I did that: I noticed that aws.s3
Imports
xml2
and, since analogsea
Imports
it too, xml2
is basically a hard-dependency for analogsea
anyway. I left the requireNamespace
guards in just in case analogsea
ever moves away from Import
ing/using aws.s3
.
Agree to just put multipart
as an argument.
Sounds good on the xml2
import thing.
then the Object API is done.
so PR https://github.com/sckott/analogsea/pull/146 is ready to go from your side?
Almost. I decided to verify the aws.s3
multipart functionality actually works with Spaces and it doesn't seem to. I'll do some testing and get back to you. Otherwise, yes.
Found two blocking bugs in the aws.s3
package preventing me from implementing this package's multipart uploads with it. Might take a long time to get it patched upstream. Any preference on what to do?
hugh - i guess move ahead here and add to docs that multipart doesn't work now, but will work later?
Okay, #146 is good to go! Thanks @sckott
Filed an upstream issue for Space creation not working https://github.com/cloudyr/aws.s3/issues/199
aws.s3 is getting some fixes as we speak [1] so I'll plan to re-visit some of the existing issues with the Spaces implementation that are waiting on upstream changes this week.
Sat down with this a bit yesterday and today and ran into more upstream issues. @leeper added support for non-AWS S3-compatible storages but I'm getting signature mismatch errors no matter what I try now: https://github.com/cloudyr/aws.s3/issues/189. I think I'll wait to see what @leeper says in case this is a quick fix on either my end or aws.s3's.
thanks for the update
@amoeba any update on this?
I wish. I'm kinda stuck on https://github.com/cloudyr/aws.s3/issues/189#issuecomment-381448618. Something to do with assumptions in https://github.com/cloudyr/aws.signature about AWS vs. non-AWS endpoints. I feel like I might be able to throw some more brain cells at it (and a fresh look could help) but I'm not yet sure what's going on and why were getting SignatureDoesNotMatch
errors.
Okay. So is the only thing not working multipart uploads due to this aws.s3 thing ? If so, can we move on towards a cran submission and just document that?
I think other things aren't working too, like creating a Space. I can't check today but I'll see if I can't take a look tomorrow while traveling. There are enough moving parts here that I don't have a good handle off-hand of what works, what doesn't, what broke, and why.
I'd definitely like to see this get off to CRAN ASAP, maybe by the end of the month.
Bucket creation is definitely not working. I thought we'd found the issue over on https://github.com/cloudyr/aws.s3/issues/189 but that wasn't quite the trick.
Okay, will also take a look soon
Thanks! I haven't been able to devote enough debugging time. I'm at a point where my next step is to take aws.s3
out of the equation to see if there's just an issue there or if aws.signature
needs to be generalized.
okay
hey @amoeba - thinking about this a bit, I played around with trying to fix bucket creation, but go nowhere. Okay with you if we move on with next cran submission with what spaces stuff works for now? With functions that don't work yet, we could just not export them, or export and document as not working yet, and add a stop() inside them with message?
seems like at least these don't work for me:
space_create
spaces_object_delete
That sounds good to me. Thanks a million for trying to fix the issue.
Releasing a partial implementation of the Spaces API seems useful in its own right if those are the only two functions that aren't working so let's do that. I'll test to see which functions work to see if my list matches yours.
@sckott wrote:
With functions that don't work yet, we could just not export them, or export and document as not working yet, and add a stop() inside them with message?
Do you have a preference here? I'd lean towards not exporting but I'd like to do the least surprising thing.
I'll get to this tonight or tomorrow.
not exporting is probably best, and good to document that they'll come along later
I get the same list of non-working functions. I'll start with the changes now.
Other notes that just came up:
> spaces_object_put("~/Downloads/test.txt", space = "analogsea-test")
[1] TRUE
This function just returns exactly what aws.s3::put_object
returns. When it fails, I think it always throws an exception so could I just wrap it in an invisible()
?
spaces_object_copy
does a similar thing:> spaces_object_copy("test.txt", "test2.txt", "analogsea-test", "analogsea-test")
$LastModified
[1] "2018-06-12T04:45:49.357Z"
$ETag
[1] "00f7d50ab4278a7899d7499481c9603a"
attr(,"x-amz-request-id")
[1] "tx0000000000000076436a6-005b1f4ffd-8abd1d-nyc3a"
attr(,"content-type")
[1] "application/xml"
attr(,"content-length")
[1] "183"
attr(,"date")
[1] "Tue, 12 Jun 2018 04:45:49 GMT"
attr(,"strict-transport-security")
[1] "max-age=15552000; includeSubDomains; preload"
Would it also be better to invisble()
this (and also eventually provide a proper class for the response?). This also applies to spaces_object_head()
,
I just checked droplet_create()
and it returns non-invisibly so maybe I should just follow that.
Cool, thanks for the PR. If nothing else on this for now. i'll push to cran in the next few days. i'll see if any problems pop up in checking before then
Great, I'm satisfied for now if you are. Thanks for all the help.
still not pushed yet, sorry, getting there soon hopefully
NP!
Okay, so @thosjleeper worked some magic on aws.s3 and everything works now. PR incoming. The checklist at the top of this Issue is accurate AFAIK.
@amoeba we need to push a new version soon (this milestone) - should we close this issue or keep it open and move to next milestone?
Hey @sckott, I'm cool keeping this open and finishing it up. I can make time for it before the month is over. Is that soon enough? I can also do #147 in the time frame.
Sounds good, I think we should submit by 28 Jan - just in case need to resubmit a few times - don't want to risk missing 31 Jan deadline
https://github.com/sckott/analogsea/pull/138 added support for part of the Spaces API and we decided in #136 to start a fresh issue to help divvy up the tasks of completing the API. Below is a checklist with items transcribed from the Spaces API docs:
Let me know if the organization of this list doesn't make sense or isn't useful for tackling this.