Minor performance - Githubissues

DyfanJones commented 5 months ago

This a minor performance enhancement.

Focusing on:

removal of sapply in favour of vapply or lapply
improved get_token handler within paginate functions (only need to create path to token once).
enhanced url_parse developed from httr2 (long term httr2 will slowly replace httr dependency)
enhanced build_query_string function
new parse_in_half method thanks to @hadley for discussion around performance https://github.com/r-lib/httr2/pull/430 (utilised in read_ini and paws_url_parse).
simplified get_idempotency_token function to improve performance

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (beb8e31) 84.84% compared to head (4b6c122) 84.92%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #745 +/- ## ========================================== + Coverage 84.84% 84.92% +0.07% ========================================== Files 204 204 Lines 14836 14992 +156 ========================================== + Hits 12588 12732 +144 - Misses 2248 2260 +12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

DyfanJones commented 5 months ago

Some simple benchmarks

origin_params <- c("a=b&c=d&e=&g=&h=i")
modify_params <- list(b = 3, c = 4)

(bm <- bench::mark(
  new_mth = paws.common:::parse_query_string(origin_params),
  old_mth = parse_query_string(origin_params)
))
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_mth      16.2µs   17.5µs    55246.    5.88MB     27.6
#> 2 old_mth      26.5µs   27.6µs    35198.    4.48MB     21.1

bm |> ggplot2::autoplot()
#> Loading required namespace: tidyr


parse_params <- paws.common:::parse_query_string(origin_params)

(bm <- bench::mark(
  new_mth = paws.common:::build_query_string(parse_params),
  old_mth = build_query_string(parse_params)
))
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_mth      12.5µs   13.4µs    73375.      58KB     14.7
#> 2 old_mth      78.4µs   82.7µs    11835.     156KB     19.2

bm |> ggplot2::autoplot()


(bm <- bench::mark(
  new_mth = paws.common:::update_query_string(origin_params, modify_params),
  old_mth = update_query_string(origin_params, modify_params)
))
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_mth      33.8µs   35.8µs    27449.      14KB     16.5
#> 2 old_mth     125.4µs  130.7µs     7546.    37.8KB     16.7

bm |> ggplot2::autoplot()

^{Created on 2024-02-07 with reprex v2.1.0}

DyfanJones commented 5 months ago

Basic performance benchmark for new_operation

sapply_new_operation <- function(name, http_method, http_path, paginator, before_presign_fn = NULL) {
  args <- as.list(environment())
  args[sapply(args, is.null)] <- NULL
  return(do.call(paws.common:::Operation, args))
}

vapply_new_operation <- function(name, http_method, http_path, paginator, before_presign_fn = NULL) {
  args <- as.list(environment())
  args[vapply(args, is.null, FUN.VALUE = logical(1))] <- NULL
  return(do.call(paws.common:::Operation, args))
}

lengths_new_operation <- function(name, http_method, http_path, paginator, before_presign_fn = NULL) {
  args <- as.list(environment())
  args[lengths(args) == 0] <- NULL
  return(do.call(paws.common:::Operation, args))
}

kwargs <- list(
  name = "ListBuckets",
  http_method = "GET",
  http_path = "/",
  paginator = list()
)

(bm <- bench::mark(
  old_mth = do.call(sapply_new_operation, kwargs),
  vapply_mth = do.call(vapply_new_operation, kwargs),
  lengths_mth = do.call(lengths_new_operation, kwargs)
))
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old_mth      10.95µs  11.89µs    74650.    5.85MB     52.3
#> 2 vapply_mth    6.64µs   7.42µs   124240.        0B     49.7
#> 3 lengths_mth   5.25µs    5.9µs   159226.        0B     47.8

bm |> ggplot2::autoplot()
#> Loading required namespace: tidyr

^{Created on 2024-02-07 with reprex v2.1.0}

New method will use lengths method

paws-r / paws

Minor performance #745

Codecov Report