pachadotdev / analogsea

Digital Ocean R client
https://pacha.dev/analogsea/
Apache License 2.0
154 stars 24 forks source link

Finish implementation of Spaces API #140

Open amoeba opened 6 years ago

amoeba commented 6 years ago

https://github.com/sckott/analogsea/pull/138 added support for part of the Spaces API and we decided in #136 to start a fresh issue to help divvy up the tasks of completing the API. Below is a checklist with items transcribed from the Spaces API docs:

Let me know if the organization of this list doesn't make sense or isn't useful for tackling this.

amoeba commented 6 years ago

@sckott I more or less copied the list from the API docs. Does having analogsea implement all of these seem appropriate?

@thercast Did you have any preference for which methods you wanted to tackle first? I should be able to take a look at this some evening this week and actually be helpful.

sckott commented 6 years ago

I think it makes sense to include all of them

amoeba commented 6 years ago

I sat down to get a lay of this and everything looks very reasonable. I wrote a get_object function to match the Spaces API method GET ${BUCKET}.${REGION}.digitaloceanspaces.com/${OBJECT_KEY} using aws.s3::get_object and immediately noticed that aws.s3 didn't reasonably handle error cases.

See this example with an invalid bucket name, purposely run:

> rawToChar(space_get_object("welcome.html", "test-analogseas"))
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Error><Code>NoSuchBucket</Code><BucketName>test-analogseas</BucketName><RequestId>tx000000000000002f5f59e-0059ffb7c0-3f970-nyc3a</RequestId><HostId>3f970-nyc3a-nyc</HostId></Error>"

It returns the raw response and throws away the underlying HTTP request's status code so there appears to be no way to know if this is the file I have stored at welcome.html or an error.

Looking at the source I can see why. Looking at the open Issues on the repo, I see that the function isn't really written to handle errors see this issue.

If it's the case that the aws.s3 package doesn't handle errors, I wonder if we (1) care and (2) would rather try to stay with using aws.s3 or use crul to get proper error handling.

Thoughts?

rpodcast commented 6 years ago

@amoeba thanks for your work on this! I've been AFK due to illness but I want to jump back in to help out. I'll add notes on which features I would like to tackle this week.

Regarding the error handling with aws.s3: I struggled with how to handle this in my first implementation. We have a lot of functionality already built for us within aws.s3, and having that convenience for this early implementation of spaces is hard to pass up. But showing just the raw response when the request errors out is a real eye sore compared to how the other analogsea functions elegantly handle errors via crul. From my perspective, I could see the long term path being a re-write of the key functionality using crul to be consistent with the rest of the package and having one less dependency, but I will defer to @sckott for his take.

sckott commented 6 years ago

Thanks for raising the issue with erroring. I definitely value failing well, so I'd strongly prefer to make sure users get useful errors - and DigitalOcean, at least with the Droplet API (and I assume with the spaces API) fails well and correctly making it easy to give users appropriate errors

We currently still use httr in this package. Main thing keeping httr around is that it can do oauth whereas crul cannot, but i think an argument can be made to get rid of oauth if we think most people don't use it but instead use PATs

For now I think it's okay to stick with aws.s3, but eventually i think we should rework to make sure spaces fxns error well

amoeba commented 6 years ago

Sounds good, thanks for weighing in @sckott. If/when the aws.s3 package changes how they handle error cases, this package benefits automatically, so I'm down with filling in this portion of the package with aws.s3 as much as possible.

@thercast Sorry to hear you were out! Looking forward to your notes.

sckott commented 6 years ago

any thoughts on this? anything I can help with?

amoeba commented 6 years ago

I'm still down to help PR this addition but wanted to let @thercast take lead since it was his idea. Hopefully he'll pop in here and we can get this done soon?

sckott commented 6 years ago

@thercast any thoughts? if you're too busy, anything @amoeba or I can move forward on?

rpodcast commented 6 years ago

Hi @sckott and @amoeba, sorry for the delay (an urgent shiny app came up at work and then the kids got sick) but I would like to tackle the listing of bucket contents and deletion of a bucket (these tasks should help shake off the cobwebs of being away for a bit). @amoeba I'd be glad if you'd like to start on the fundamental operations for objects in a space (i.e. downloading, uploading, deleting). Does that sound like a good plan?

amoeba commented 6 years ago

Sounds good to me!

amoeba commented 6 years ago

First half of the Object calls implemented in https://github.com/sckott/analogsea/pull/146. Went pretty well. Regarding the calls I didn't implement,

General questions:

I'll look into the delete operation issue above later this week.

sckott commented 6 years ago

Regarding the PR, I assume you're adding more to it? Or do you want it to be handled as is?

amoeba commented 6 years ago

I should've made that more clear, sorry: I'm planning on adding more commits but needed the intermediate review (thanks!). Once I'm done, do you mind a messy PR or would you rather I squash the PR into a single commit once I'm happy?

Re: Your comments:

I think it would be better to change the structure to e.g. spaces_thing_verb

Sold!

tests

Sounds great.

docs: the repeated parameters in each spaces fxn could go into man-roxygen/ as a template file to save lines

Great, this is new to me so I'll look that up and take a stab at it.

wrap booleans in \code{}

Good suggestion!

maybe for examples, ...

I like this a lot. Will do.

Yes, I think breaking up spaces fxns into many files is good

Great, will do.

amoeba commented 6 years ago

Okay, figured out the Roxygen template functionality to DRY up the common @params.

amoeba commented 6 years ago

Figured out the problem with deleting objects. Totally just due to a bug in aws.s3, which I've Issue+PR'd https://github.com/cloudyr/aws.s3/issues/189

sckott commented 6 years ago

not sure on the multipart thing. to make separate functions, how many more fxns does that add?

for xml, like idea of putting xml2 in Suggests in DESCRIPTION file, then like you say we parse to an xml_document if installed and if not then just the character string.

amoeba commented 6 years ago

not sure on the multipart thing. to make separate functions, how many more fxns does that add

It'd be five new functions which have a 1:1 mapping with their Spaces API calls. I think the main reason to do this is so we can claim this package implements the Spaces API with 1:1 mappings between API routes and R functions. I don't think this would be commonly used by this package's users.

For the user to upload a multi-part file, they would have to write their own routine to start, upload each piece, and complete the upload (which seems less-than-fun and error-prone). The functionality that would be provided by the five separate functions is almost entirely wrapped up in aws.s3::put_bucket(..., multipart=TRUE, ...) except for the "Cancel a Multi-part Upload" call.

I'd lobby for not doing these calls now and adding docs to spaces_object_put showing how to use the multipart argument. When/if this package switches to crul instead of aws.s3, that would be a good time to write these individual API calls out as functions.

amoeba commented 6 years ago

Okay, xml2 support added to spaces_acl_put and spaces_acl_get in https://github.com/sckott/analogsea/pull/146/commits/19a98b070145893bd64ed859e352d318b6317c9b

If you're okay with not implementing the multi-part API with 1:1 functions, then the Object API is done.

amoeba commented 6 years ago

Oh, and a note on how I did that: I noticed that aws.s3 Imports xml2 and, since analogsea Imports it too, xml2 is basically a hard-dependency for analogsea anyway. I left the requireNamespace guards in just in case analogsea ever moves away from Importing/using aws.s3.

sckott commented 6 years ago

Agree to just put multipart as an argument.

Sounds good on the xml2 import thing.

then the Object API is done.

so PR https://github.com/sckott/analogsea/pull/146 is ready to go from your side?

amoeba commented 6 years ago

Almost. I decided to verify the aws.s3 multipart functionality actually works with Spaces and it doesn't seem to. I'll do some testing and get back to you. Otherwise, yes.

amoeba commented 6 years ago

Found two blocking bugs in the aws.s3 package preventing me from implementing this package's multipart uploads with it. Might take a long time to get it patched upstream. Any preference on what to do?

sckott commented 6 years ago

hugh - i guess move ahead here and add to docs that multipart doesn't work now, but will work later?

amoeba commented 6 years ago

Okay, #146 is good to go! Thanks @sckott

amoeba commented 6 years ago

Filed an upstream issue for Space creation not working https://github.com/cloudyr/aws.s3/issues/199

amoeba commented 6 years ago

aws.s3 is getting some fixes as we speak [1] so I'll plan to re-visit some of the existing issues with the Spaces implementation that are waiting on upstream changes this week.

[1] https://github.com/cloudyr/aws.s3/issues/189

amoeba commented 6 years ago

Sat down with this a bit yesterday and today and ran into more upstream issues. @leeper added support for non-AWS S3-compatible storages but I'm getting signature mismatch errors no matter what I try now: https://github.com/cloudyr/aws.s3/issues/189. I think I'll wait to see what @leeper says in case this is a quick fix on either my end or aws.s3's.

sckott commented 6 years ago

thanks for the update

sckott commented 6 years ago

@amoeba any update on this?

amoeba commented 6 years ago

I wish. I'm kinda stuck on https://github.com/cloudyr/aws.s3/issues/189#issuecomment-381448618. Something to do with assumptions in https://github.com/cloudyr/aws.signature about AWS vs. non-AWS endpoints. I feel like I might be able to throw some more brain cells at it (and a fresh look could help) but I'm not yet sure what's going on and why were getting SignatureDoesNotMatch errors.

sckott commented 6 years ago

Okay. So is the only thing not working multipart uploads due to this aws.s3 thing ? If so, can we move on towards a cran submission and just document that?

amoeba commented 6 years ago

I think other things aren't working too, like creating a Space. I can't check today but I'll see if I can't take a look tomorrow while traveling. There are enough moving parts here that I don't have a good handle off-hand of what works, what doesn't, what broke, and why.

I'd definitely like to see this get off to CRAN ASAP, maybe by the end of the month.

amoeba commented 6 years ago

Bucket creation is definitely not working. I thought we'd found the issue over on https://github.com/cloudyr/aws.s3/issues/189 but that wasn't quite the trick.

sckott commented 6 years ago

Okay, will also take a look soon

amoeba commented 6 years ago

Thanks! I haven't been able to devote enough debugging time. I'm at a point where my next step is to take aws.s3 out of the equation to see if there's just an issue there or if aws.signature needs to be generalized.

sckott commented 6 years ago

okay

sckott commented 6 years ago

hey @amoeba - thinking about this a bit, I played around with trying to fix bucket creation, but go nowhere. Okay with you if we move on with next cran submission with what spaces stuff works for now? With functions that don't work yet, we could just not export them, or export and document as not working yet, and add a stop() inside them with message?

seems like at least these don't work for me:

amoeba commented 6 years ago

That sounds good to me. Thanks a million for trying to fix the issue.

Releasing a partial implementation of the Spaces API seems useful in its own right if those are the only two functions that aren't working so let's do that. I'll test to see which functions work to see if my list matches yours.

@sckott wrote:

With functions that don't work yet, we could just not export them, or export and document as not working yet, and add a stop() inside them with message?

Do you have a preference here? I'd lean towards not exporting but I'd like to do the least surprising thing.

I'll get to this tonight or tomorrow.

sckott commented 6 years ago

not exporting is probably best, and good to document that they'll come along later

amoeba commented 6 years ago

I get the same list of non-working functions. I'll start with the changes now.

Other notes that just came up:

> spaces_object_put("~/Downloads/test.txt", space = "analogsea-test")
[1] TRUE

This function just returns exactly what aws.s3::put_object returns. When it fails, I think it always throws an exception so could I just wrap it in an invisible()?

> spaces_object_copy("test.txt", "test2.txt", "analogsea-test", "analogsea-test")
$LastModified
[1] "2018-06-12T04:45:49.357Z"

$ETag
[1] "00f7d50ab4278a7899d7499481c9603a"

attr(,"x-amz-request-id")
[1] "tx0000000000000076436a6-005b1f4ffd-8abd1d-nyc3a"
attr(,"content-type")
[1] "application/xml"
attr(,"content-length")
[1] "183"
attr(,"date")
[1] "Tue, 12 Jun 2018 04:45:49 GMT"
attr(,"strict-transport-security")
[1] "max-age=15552000; includeSubDomains; preload"

Would it also be better to invisble() this (and also eventually provide a proper class for the response?). This also applies to spaces_object_head(),

amoeba commented 6 years ago

I just checked droplet_create() and it returns non-invisibly so maybe I should just follow that.

sckott commented 6 years ago

Cool, thanks for the PR. If nothing else on this for now. i'll push to cran in the next few days. i'll see if any problems pop up in checking before then

amoeba commented 6 years ago

Great, I'm satisfied for now if you are. Thanks for all the help.

sckott commented 6 years ago

still not pushed yet, sorry, getting there soon hopefully

amoeba commented 6 years ago

NP!

amoeba commented 6 years ago

Okay, so @thosjleeper worked some magic on aws.s3 and everything works now. PR incoming. The checklist at the top of this Issue is accurate AFAIK.

sckott commented 4 years ago

@amoeba we need to push a new version soon (this milestone) - should we close this issue or keep it open and move to next milestone?

amoeba commented 4 years ago

Hey @sckott, I'm cool keeping this open and finishing it up. I can make time for it before the month is over. Is that soon enough? I can also do #147 in the time frame.

sckott commented 4 years ago

Sounds good, I think we should submit by 28 Jan - just in case need to resubmit a few times - don't want to risk missing 31 Jan deadline