paws-r / paws

Paws, a package for Amazon Web Services in R
https://www.paws-r-sdk.com
Other
313 stars 37 forks source link

Initial implementation of standard retry #660

Closed DyfanJones closed 12 months ago

DyfanJones commented 1 year ago

This PR implements standard retry (https://github.com/boto/boto3/blob/0b82bf9843ad6d350b48442c47f4a484a886ee3f/docs/source/guide/retries.rst#standard-retry-mode).

Address Ticket: https://github.com/paws-r/paws/issues/520

DyfanJones commented 1 year ago

@davidkretch @adambanker Am I going crazy but it looks like this implements the standard retries logic: https://github.com/boto/boto3/blob/0b82bf9843ad6d350b48442c47f4a484a886ee3f/docs/source/guide/retries.rst#standard-retry-mode

wlandau commented 1 year ago

Thanks for working on this!

How do I configure max_retries for S3? Are there other retry-related settings?

I ran my tests in targets, and I notice some new errors, one of which happens when I try to delete a bucket.

random_bucket_name <- function() {
  paste0(
    "targets-test-bucket-",
    substr(
      digest::digest(tempfile(), algo = "sha256"),
      start = 0,
      stop = 43
    )
  )
}

aws_s3_delete_bucket <- function(bucket, client = paws.storage::s3()) {
  region <- client$get_bucket_location(Bucket = bucket)
  old <- Sys.getenv("AWS_REGION")
  on.exit(Sys.setenv(AWS_REGION = old))
  Sys.setenv(AWS_REGION = region)
  out <- client$list_object_versions(Bucket = bucket)
  for (x in out$Versions) {
    args <- list(
      Bucket = bucket,
      Key = x$Key
    )
    has_version_id <- !identical(x$VersionId, "null") &&
      !identical(x$VersionId, character(0))
    if (has_version_id) {
      args$VersionId <- x$VersionId
    }
    do.call(client$delete_object, args)
  }
  client$delete_bucket(Bucket = bucket)
}

bucket <- random_bucket_name()
paws.storage::s3()$create_bucket(Bucket = bucket)

aws_s3_delete_bucket(bucket)
#> Error: SignatureDoesNotMatch (HTTP 403). The request signature we calculated does not match the signature you provided. Check your key and signing method.

traceback()
#> 8: stop(error)
#> 7: handler$fn(request)
#> 6: run(request, retry)
#> 5: retry(request)
#> 4: send_request(request)
#> 3: (function (Bucket, Key, MFA = NULL, VersionId = NULL, RequestPayer = NULL, 
#>        BypassGovernanceRetention = NULL, ExpectedBucketOwner = NULL) 
#>    {
#>        op <- new_operation(name = "DeleteObject", http_method = "DELETE", 
#>            http_path = "/{Bucket}/{Key+}", paginator = list())
#>        input <- .s3$delete_object_input(Bucket = Bucket, Key = Key, 
#>            MFA = MFA, VersionId = VersionId, RequestPayer = RequestPayer, 
#>            BypassGovernanceRetention = BypassGovernanceRetention, 
#>            ExpectedBucketOwner = ExpectedBucketOwner)
#>        output <- .s3$delete_object_output()
#>        config <- get_config()
#>        svc <- .s3$service(config)
#>        request <- new_request(svc, op, input, output)
#>        response <- send_request(request)
#>        return(response)
#>    })(Bucket = "targets-test-bucket-31d505c5eea61275e72b1a7db3463315c802bdba014", 
#>        Key = list(), VersionId = list())
#> 2: do.call(client$delete_object, args) at helper-aws.R#37
#> 1: aws_s3_delete_bucket(bucket)#> 
DyfanJones commented 1 year ago

Hmm that is a pain that must of slipped in from the PR https://github.com/paws-r/paws/pull/621. When I identify the issue I will update the units tests to ensure it doesn't happen again 😄

wlandau commented 1 year ago

Thanks! Happy to test again when you are ready.

targets has a max_tries setting for this, and it currently does its own retry algorithm. I can switch max_tries to delegate to paws.common when the time comes. What kind of argument or setting would I use? Something like paws.storage::s3(config = list(tries = 5))?

DyfanJones commented 1 year ago

Ah sorry I didn't say how to change the max retries.

# paws v-0.3.0
library(paws)
library(paws.common)

client <- s3(config(max_retries = 5))

# or 
client <- s3(config = list(max_retries = 5))

# paws v-0.4.0+
library(paws)

client <- s3(config(max_retries = 5))

# or 
client <- s3(config = list(max_retries = 5))

NOTE: helper functions config, credentials and creds are the just helper functions. They construct the nested config list and to give some auto complete when constructing it 😄

By default it will set the max retries to 3 to match boto3 documentation: https://github.com/boto/boto3/blob/0b82bf9843ad6d350b48442c47f4a484a886ee3f/docs/source/guide/retries.rst#standard-retry-mode

I believe the bug is coming some where in the s3 unmarshal functions when it decodes the xml. I haven't pin point it just yet, hopefully it is a simple fix 😄

DyfanJones commented 1 year ago

Ah I have found the issue. Well it isn't an issue more of an "hmmm what approach should paws take on this".

Previously paws would return and empty list for xml list objects even if it had nested content.

paws.common 0.5.8

bucket <- "empty-bucket"
client <- paws.storage::s3()
out <- client$list_object_versions(Bucket = bucket)
out$Versions
#> list()

Created on 2023-08-29 with reprex v2.0.2

The latest version of paws.common will build the default values for all list, map, structures.

paws.common 0.5.9

bucket <- "empty-bucket"
client <- paws.storage::s3()
out <- client$list_object_versions(Bucket = bucket)
out$Versions
#> [[1]]
#> [[1]]$ETag
#> list()
#> 
#> [[1]]$ChecksumAlgorithm
#> [[1]]$ChecksumAlgorithm[[1]]
#> list()
#> 
#> 
#> [[1]]$Size
#> numeric(0)
#> 
#> [[1]]$StorageClass
#> list()
#> 
#> [[1]]$Key
#> list()
#> 
#> [[1]]$VersionId
#> list()
#> 
#> [[1]]$IsLatest
#> logical(0)
#> 
#> [[1]]$LastModified
#> POSIXct of length 0
#> 
#> [[1]]$Owner
#> [[1]]$Owner[[1]]
#> [[1]]$Owner[[1]]$DisplayName
#> list()
#> 
#> [[1]]$Owner[[1]]$ID
#> list()

Created on 2023-08-29 with reprex v2.0.2

So technically paws.common 0.5.9 is returning a more accurate default view: https://www.paws-r-sdk.com/docs/s3_list_object_versions/#value

So for your example @wlandau, paws is putting empty Key and VersionId in the do.call(client$delete_object, args) which will cause an error (not a great error though).

We can a couple of options:

  1. Simplify paws.common 0.5.9 default output to replicate 0.5.8
  2. Take boto3 approach and don't have any default output
  3. Embrace paws.common 0.5.9 enriched output.
DyfanJones commented 1 year ago

My gut feeling is to simplify paws.common 0.5.9 defaults to match 0.5.8. @wlandau, @davidkretch, @adambanker do you have an opinion on this :)

wlandau commented 1 year ago

If I understand correctly, that is the more back-compatible approach and will help me avoid refactoring 'targets'. So unless I am missing something, I like it.

DyfanJones commented 1 year ago

@wlandau yes that is correct. I will have to look how easy it is to implement :)

DyfanJones commented 1 year ago

Ok I believe I have fixed the default values for the xml parser. I have also fixed a bug where nested structure, lists weren't getting flattened correctly. Do you mind running your tests @wlandau for targets?

DyfanJones commented 1 year ago

Next steps if targets unit tests work ok :)

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: +0.09% :tada:

Comparison is base (5bc1dad) 83.91% compared to head (33df6e2) 84.00%.

:exclamation: Current head 33df6e2 differs from pull request most recent head c3b67e3. Consider uploading reports for the commit c3b67e3 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #660 +/- ## ========================================== + Coverage 83.91% 84.00% +0.09% ========================================== Files 198 201 +3 Lines 14314 14362 +48 ========================================== + Hits 12011 12065 +54 + Misses 2303 2297 -6 ``` [see 21 files with indirect coverage changes](https://app.codecov.io/gh/paws-r/paws/pull/660/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paws-r)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wlandau commented 1 year ago

Thanks for working on this, @DyfanJones. Almost all of the unit tests in targets are passing now, only multipart uploads are struggling. On my end, the example at https://github.com/paws-r/paws/blob/main/examples/s3_multipart_upload.R errors out with:

Error : C stack usage  7954104 is too close to the limit

> traceback()
3: readBin(con, what = "raw", n = file.size(tmp))
2: .s3$upload_part_input(Body = Body, Bucket = Bucket, ContentLength = ContentLength, 
       ContentMD5 = ContentMD5, ChecksumAlgorithm = ChecksumAlgorithm, 
       ChecksumCRC32 = ChecksumCRC32, ChecksumCRC32C = ChecksumCRC32C, 
       ChecksumSHA1 = ChecksumSHA1, ChecksumSHA256 = ChecksumSHA256, 
       Key = Key, PartNumber = PartNumber, UploadId = UploadId, 
       SSECustomerAlgorithm = SSECustomerAlgorithm, SSECustomerKey = SSECustomerKey, 
       SSECustomerKeyMD5 = SSECustomerKeyMD5, RequestPayer = RequestPayer, 
       ExpectedBucketOwner = ExpectedBucketOwner)
1: client$upload_part(Body = readBin(con, what = "raw", n = file.size(tmp)), 
       Bucket = bucket, Key = "x", PartNumber = 1, UploadId = upload$UploadId)
DyfanJones commented 1 year ago

ah so close 😆 @wlandau Thanks for the testing. I will need to enrich the unit tests to catch these edge cases as well :)

DyfanJones commented 12 months ago

Thanks for identifying the issue @wlandau I have now fixed it. It was sending NULL value for the interface and it would just fail with the new xml parse logic. I will add these xml parse edge chases to the unit tests as the current unit tests don't cater for them.

wlandau commented 12 months ago

Thanks for the updates. All my tests pass now.

DyfanJones commented 12 months ago

I believe I have got a working solution for native retries.

Here is an example where I throttle AWS and it successfully retries.

remotes::install_github("DyfanJones/paws/paws.common", ref = "native_retry")
install.packages('paws', repos = c(pawsr = 'https://paws-r.r-universe.dev', CRAN = 'https://cloud.r-project.org'))

library(paws)

client <- sagemaker()
resp <- client$list_processing_jobs() |> paginate()

#> ERROR [2023-08-31 15:44:41.206]: Request failed. Retrying in 0.016 seconds...
#> ERROR [2023-08-31 15:44:41.229]: Request failed. Retrying in 3.026 seconds...
#> ERROR [2023-08-31 15:44:44.289]: Request failed. Retrying in 2.505 seconds...
#> Error: ThrottlingException (HTTP 400). Rate exceeded

Created on 2023-08-31 with reprex v2.0.2

DyfanJones commented 12 months ago

After looking at botocore's standard retrier (https://github.com/boto/botocore/blob/71a80e4dd9428351bbdcd0c1a06fbcc70bb091b4/botocore/retries/standard.py) I believe I am not a million miles away with this implementation. I will add some unit tests and I think we will have an initial native retrier :)

wlandau commented 12 months ago

Thank you so much, @DyfanJones! I will begin migrating targets to use native retries in paws.

DyfanJones commented 12 months ago

Thanks for all the testing you did @wlandau. I am currently in the process in checking if the latest changes breaks any dependent packages. I have marked targets as working correctly https://github.com/paws-r/paws/issues/657 however if this isn't true please let me know.

wlandau commented 12 months ago

Absolutely! Coincidentally, this a convenient time for me to do testing because of https://github.com/ropensci/targets/pull/1130 (c.f. https://books.ropensci.org/targets/cloud-storage.html).

I just migrated targets to use s3(config = list(max_retries = ...)), and although I am not sure how to test the actual retrying mechanism using only S3, my whole test suite passes with both CRAN paws.common and with remotes::install_github("dyfanjones/paws/paws.common") (https://github.com/paws-r/paws/commit/6c28ed42709993b8696116dbb6f7ce9e805f0686).